-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configurable Kopia Maintenance Interval #8364
Comments
I can be assigned this issue |
@kaovilai I think what we want here is just a bool entry -- "alwaysUseFullMaintenance" or something. |
Sure. Bool entry if that works for everyone. |
Could we clarify the scenarios why we want users to config the maintenance mode? Basically, we don't want users to change the mode, because full maintenance and quick maintenance are very different from each other, they are designed to happen alternatively and the quick one is more frequent. Changing it manually may cause unexpected consequences:
Keeping the data in a reasonable time is a policy of Kopia to assure the system success to work, manually changing the maintenance mode could not result in the data to be deleted earlier. |
Another point:
Therefore, it is not safe nor necessary to add the maintenance mode into Unified Repository. At present, we let repo itself to decide how to do maintenance, including the mode and frequency, and offload the maintenance work totally to the repo itself |
We probably want this to occur more often at least for testing/debugging. And we've been getting customer cases where they are saying maintenance does not actually work for them so backup expires but nothing is getting deleted. |
This may be as the expected behavior, e.g., the data may be referenced by other backups and should not deleted.
For this purpose, if the debugging happens on users' production environments, changing anything to the maintenance is still not recommended since this may result in users' data lose; if the testing/debugging happens in our dev environments, I think we can change the code locally, moreover, as mentioned above, there are many margins of sub tasks need to be adjusted, only changing the mode may not make it work as expected. |
@Lyndon-Li "Full maintenance deletes data that quick maintenance doesn't. Running full maintenance too frequent causing the data to be deleted earlier unnecessarily and may result in potential data lose." There shouldn't be any risk here, since kopia requires 4 separate full maintenance cycles at least four hours apart before it will remove any data. The concern is that with the default "once a day" full maintenance, it will be 24 hours at the earliest, but up to 48 hours once a blob is no longer referenced by a needed snapshot. We could reduce this window to 4+ hours if full maintenance ran more often. But even if you ran full maintenance constantly (which we wouldn't actually want) it shouldn't put the data at risk because kopia's built-in safety mechanisms require GC to mark a blob as safe to delete during two separate full maint cycles at least 4 hours apart. |
I don't know if this is possible, but maybe there's a way to configure the kopia repo to do full maintenance more than once per day when velero runs maintenance with "mode=auto" -- that might be cleaner than a config to always run full, but I don't know whether that can be done. Then we could have behavior where full is done every 6 hours but quick every hour. |
It looks like we probably can do that here:
Maybe making this configurable is preferable to an "always use full maint" flag. Then we could recommend for users who want data to be deleted more quickly to set this to 6 or 12 hours instead of the default 24. |
Yes, if you change the mode but not any margin, full maintenance doesn't make any effect but consume more resources; if you change the mode and also some margins, data risk will happen. |
This looks more rational. The overwrite value could be set to
|
@Lyndon-Li I think that's fine. 24/12/6 hour options should be sufficient. There's zero value in full maint more often than 4 hours, and exactly 4 hours could produce edge cases (i.e. last full maint marked this blob 3:59:58 ago and therefore it's too soon to delete now by 2 seconds), and 5 hours doesn't give you consistent day-to-day maint times. So 6 is realistically the smallest value that makes sense. |
@Lyndon-Li I really like your idea to have pre-set options, that makes it easy for the user to configure preserving underlying repo requirements (e.g. <4h doesn't make sense, so user won't set unacceptable parameters). |
@kaovilai |
Describe the problem/challenge you have
We want ability to configure maintenance interval to affect change to storage more quickly in some cases.
These can be configured in the
repo-maintenance-job-configmap
Describe the solution you'd like
Anything else you would like to add:
Environment:
velero version
):kubectl version
):/etc/os-release
):Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.
cc: @shubham-pampattiwar @weshayutin
The text was updated successfully, but these errors were encountered: