Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable configuration to select backup deletion mechanism #18

Open
akloss-cibo opened this issue Dec 12, 2024 · 2 comments
Open

Enable configuration to select backup deletion mechanism #18

akloss-cibo opened this issue Dec 12, 2024 · 2 comments

Comments

@akloss-cibo
Copy link

Way back in kopeio/etcd-manager#325 it was observed that object versioning and deleting the object (which actually means creating a deletion marker in a versioned bucket) can create large storage costs as the backups are not removed by etcd.

However, in a bucket with S3 Object Lock enabled, it typically not possible to delete a version until its lock is released. For users in this situation, using regular S3 delete (which creates the delete marker) and then using S3 lifecycle to manage actually deleting the version is pretty desirable.

This manifests very similarly to kubernetes/kops#14031, where the cost of the tier 1 operations that are trying to delete objects gets very high.

@rifelpet
Copy link
Contributor

Can you give an example error message and HTTP response code when etcd-manager fails to delete an object that has S3 Object Lock? We could implement some caching and backoff for failures of this type.

@akloss-cibo
Copy link
Author

etcd-manager doesn't actually record an error at all, but you can see here it tries to delete the same thing more than once (because it is unable to delete it):

% logs etcd-manager-main-i-02ddfc9eebbeb1a96 | grep 'remov.* backup "2024-10-17T20:07:32Z-000525"'
Defaulted container "etcd-manager" out of: etcd-manager, kops-utils-cp (init), init-etcd-3-4-13 (init), init-etcd-3-5-9 (init), init-etcd-symlinks-3-4-13 (init), init-etcd-symlinks-3-5-9 (init)
I1217 02:00:14.525485    4775 cleanup.go:173] removing backup "2024-10-17T20:07:32Z-000525"
I1217 02:00:14.704003    4775 cleanup.go:177] removed backup "2024-10-17T20:07:32Z-000525"
I1217 03:10:46.573513    4775 cleanup.go:173] removing backup "2024-10-17T20:07:32Z-000525"
I1217 03:10:46.686841    4775 cleanup.go:177] removed backup "2024-10-17T20:07:32Z-000525"
%

The s3 cli does this (bucket and prefix obfuscated):

I1217 03:10:46.573513    4775 cleanup.go:173] removing backup "2024-10-17T20:07:32Z-000525"
% aws s3api delete-object --bucket my-bucket --key kops/my-cluster.example.com/backups/etcd/main/2024-12-17T02:59:52Z-000019/etcd.backup.gz --version-id oJwBuWF04gUN4jImBxIpXFnxNe5_3bM6

An error occurred (AccessDenied) when calling the DeleteObject operation: Access Denied because object protected by object lock.
%

This is the relevant snippet of output from the CLI with --debug on:

2024-12-16 21:19:01,849 - MainThread - urllib3.connectionpool - DEBUG - https://my-bucket.s3.us-east-1.amazonaws.com:443 "DELETE /kops/my-cluster/backups/etcd/main/2024-12-17T02%3A59%3A52Z-000019/etcd.backup.gz?versionId=oJwBuWF04gUN4jImBxIpXFnxNe5_3bM6 HTTP/1.1" 403 None
2024-12-16 21:19:01,850 - MainThread - botocore.hooks - DEBUG - Event before-parse.s3.DeleteObject: calling handler <function _handle_200_error at 0x102209ee0>
2024-12-16 21:19:01,850 - MainThread - botocore.hooks - DEBUG - Event before-parse.s3.DeleteObject: calling handler <function handle_expires_header at 0x102209d00>
2024-12-16 21:19:01,850 - MainThread - botocore.parsers - DEBUG - Response headers: {'x-amz-request-id': 'DXB3VZGZRM9BGQN8', 'x-amz-id-2': 'j0DaBvsJDX1u/c4zq5l/fXII/XL/G9ehoFLN0XlKcl8igUGlRc9J4E/iBKDj3YsCltQ+DC8c4sc=', 'Content-Type': 'application/xml', 'Transfer-Encoding': 'chunked', 'Date': 'Tue, 17 Dec 2024 03:19:01 GMT', 'Server': 'AmazonS3'}
2024-12-16 21:19:01,850 - MainThread - botocore.parsers - DEBUG - Response body:
b'<?xml version="1.0" encoding="UTF-8"?>\n<Error><Code>AccessDenied</Code><Message>Access Denied because object protected by object lock.</Message><RequestId>DXB3VZGZRM9BGQN8</RequestId><HostId>j0DaBvsJDX1u/c4zq5l/fXII/XL/G9ehoFLN0XlKcl8igUGlRc9J4E/iBKDj3YsCltQ+DC8c4sc=</HostId></Error>'

You can see the 403 response code and the error in the XML body.

justinsb added a commit to justinsb/etcd-manager-1 that referenced this issue Dec 27, 2024
We update the last-run timestamp before running cleanup, so that
we don't retry immediately if the cleanup fails.

Issue kubernetes-sigs#18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants