Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a way to clean up old lock files when using use_lockfile with versioning enabled s3 bucket #36445

Open
minamijoyo opened this issue Feb 6, 2025 · 3 comments
Labels
backend/s3 enhancement new new issue not yet triaged

Comments

@minamijoyo
Copy link
Contributor

Terraform Version

Terraform v1.10.5
on darwin_arm64

Use Cases

With the use_lockfile of the s3 backend introduced in Terraform v1.10, lock files can be managed in S3 without DynamoDB.
https://developer.hashicorp.com/terraform/language/backend/s3

A lock file with the .tflock extension is created/deleted next to the tfstate file. The problem is that the s3 bucket for tfstate is typically versioning enabled. As a result, every time I run terraform plan/apply, the entire history of lock file creation/deletion is kept in the version history. The lock file is tiny but not empty, which adds unnecessary cost. Unfortunately, I run the terraform plan command daily on all directories for drift checks, which makes the situation even worse.

Attempted Solutions

I initially considered deleting the old lock files using the S3 lifecycle rules, but this doesn't work because we can use a prefix (e.g., tflock/) in a filter condition but not a suffix (e.g., .tflock). Deleting based on object size is too risky, as it depends on implementation details.

https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketLifecycleConfiguration.html

A filter identifying a subset of objects to which the rule applies. The filter can be based on a key name prefix, object tags, object size, or any combination of these.

Proposal

It would be helpful if we could write the filter condition in an S3 lifecycle rule, such as storing the lock file under a specified prefix or giving it a tag. If I understand correctly, we need additional functionality. However, please let me know if you have any recommendations for solving the problem within the currently available features.

References

No response

@minamijoyo minamijoyo added enhancement new new issue not yet triaged labels Feb 6, 2025
@minamijoyo
Copy link
Contributor Author

I estimated the cost and found it to be almost zero. I'm just concerned about the ever-increasing history, so if anyone has any recommendations for life cycle settings, please let me know.

@minamijoyo minamijoyo changed the title Provide a cost-effective way to use_lockfile with versioning enabled s3 bucket Provide a recommended way to use_lockfile with versioning enabled s3 bucket Feb 6, 2025
@bschaatsbergen
Copy link
Member

bschaatsbergen commented Feb 6, 2025

Hey @minamijoyo,

Thank you for reporting this! The S3 backend is managed by the AWS Provider team at HashiCorp, and this issue has been added to their triage queue.

It's true that delete markers remain in an object’s version history in an S3 bucket, even after the object is deleted. With a reference to that deleted version (resulting in the file not actually being deleted from S3). That’s simply how S3 versioning works. It seems that S3 lifecycle rules can't clean up on suffixes, only prefixes and tags, see API_PutBucketLifecycleConfiguration.

I believe the costs are negligible, but it’s worth exploring optimizations. Any improvements here are always valuable. While you can work around the S3 lifecycle policy’s suffix limitation by specifying the full key for the lock file, we could explore whether tagging the lock file could be a solution.

Another thought, and common practice, is to cap the non-current versioned objects in your Terraform state bucket, for example, limiting it to a maximum of N non-current versions. This would ensure that both your Terraform state file and lock file don’t exceed more than N non-current versions, keeping S3, Google Cloud and Azure costs predictable.

It's good to note that this is also happening on Google Cloud using the gcs backend, and likely the azurerm backend too—as they share a similar implementation. I believe this issue wasn’t raised there due to the negligible costs involved in general, and perhaps they have better lifecycle policies in place. I'll look into that too, and get back to you. Thanks again!

@minamijoyo minamijoyo changed the title Provide a recommended way to use_lockfile with versioning enabled s3 bucket Provide a way to clean up old lock files when using use_lockfile with versioning enabled s3 bucket Feb 6, 2025
@minamijoyo
Copy link
Contributor Author

I will share what I have learned about the options for S3 lifecycle rules when using the use_lockfile feature.

The perfect solution is to ask AWS to support the lifecycle rule suffix. However, the feasibility of this approach is uncertain.

A practical solution under current technical constraints would be to limit the number of object non-current versions.
Here is an example of an S3 lifecycle rule:

resource "aws_s3_bucket_lifecycle_configuration" "example" {
  bucket = "mybucket"

  rule {
    id     = "cap-max-versions"
    status = "Enabled"

    noncurrent_version_expiration {
      noncurrent_days           = 1
      newer_noncurrent_versions = 100
    }
  }
}

According to the AWS API reference, the valid value range for newer_noncurrent_versions is 1-100.
https://docs.aws.amazon.com/AmazonS3/latest/API/API_NoncurrentVersionExpiration.html

Note that only up to 100 can be entered from the AWS Management Console, but any number, such as 1000, can be set from the API without error. However, it should be used within the range of 1-100.

You may not really want to see tfstate from 100 generations ago, but if you don't want to throw away valuable tfstate history just to clean up tflock, a workaround is to filter by object size. For example, as follows:

resource "aws_s3_bucket_lifecycle_configuration" "example" {
  bucket = "mybucket"

  rule {
    id     = "cleanup-tflock"
    status = "Enabled"

    filter {
      object_size_less_than = 512
    }

    noncurrent_version_expiration {
      noncurrent_days           = 1
      newer_noncurrent_versions = 100
    }
  }
}

In this example, files smaller than 512 bytes are not retained for more than 100 generations. This assumption is based on the following experimental results

  • The size of the tflock file is about 200 bytes
  • The size of the empty tfstate file is also about 200 bytes.
  • The size of the tfstate file containing one resource is about 600 bytes.

In other words, we can expect that the history of empty tfstate files may be deleted, but the history of tfstate files that contain some resources are more likely not to be deleted.

This is obviously too dependent on implementation details, which may be different in your environment and may change in future versions. However, it is worth mentioning that by setting an appropriate threshold, tfstate history can be maintained for more than 100 generations.

The above is a pragmatic solution under current technical constraints, but as you know, it is too dependent on implementation details and very fragile. If we can improve the s3 backend and put object tags to tflock files, we can write the following reliable rule:

resource "aws_s3_bucket_lifecycle_configuration" "example" {
  bucket = "mybucket"

  rule {
    id     = "cleanup-tflock"
    status = "Enabled"

    filter {
      tag {
        key   = "FileType"
        value = "tflock"
      }
    }

    noncurrent_version_expiration {
      noncurrent_days = 1
    }
  }

  rule {
    id     = "cleanup-empty"
    status = "Enabled"

    filter {
      object_size_less_than = 1
    }

    noncurrent_version_expiration {
      noncurrent_days = 1
    }
  }
}

The tag key and value are examples and may be different when implemented. Anything is acceptable as long as they can be distinguished in the filter rules.
Since tags cannot be assigned to deletion markers, deletion markers will remain in the history. Therefore, by adding a rule to delete files of less than 1 byte, the history of deletion markers can also be deleted. There is also the consideration point of whether to delete the expired_object_delete_marker that remains in the current version, but since the history does not extend indefinitely, it is a matter of your preference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend/s3 enhancement new new issue not yet triaged
Projects
None yet
Development

No branches or pull requests

2 participants