Document manual kopia maintenance cleanup with `--safety=none` #8374

kaovilai · 2024-11-06T00:39:57Z

Document that there is a way to cleanup faster, but it will have caveats and user will have to run it manually.

          > --safety=none could be documented for user as a workaround but not implemented in velero code. If agreed, we can open a documentation issue for that.

Originally posted by @reasonerjt in #8365 (comment)

The text was updated successfully, but these errors were encountered:

Lyndon-Li · 2024-11-20T05:37:43Z

I think about this again, once we document this, it means we allow users to do this unconditionally and Velero works with the repo after manually running maintenance with --safety=none.
However, we never tested it and we cannot say that maintenance with --safety=none never fails and Velero always works with it.

Since we've already had the solution for the original problem in #8364, we don't want to undergo the risk and extra work of testing or troubleshooting. Therefore, I would suggest we reconsider this. @kaovilai @reasonerjt @weshayutin

weshayutin · 2024-11-22T17:26:56Z

I agree w/ @Lyndon-Li generally. couple thoughts:

I don't know how we would know for sure from a support perspective if a customer used --safety=none which is the biggest concern to me.
I can see situations where a cluster admin is pressured to reduce cloud costs and needs to run w/o safety.
Perhaps stating that restoring a backup after maint w/ --safety=none has been executed is not supported. Customers are highly encouraged to test restores immediately if --safety=none has been executed.
Question: if a customer ran w/o saftey ( works ), restore works, then additional incremental backups are taken are they back to a GOOD enough position for support?

At the end of the day I think I would rather see a customer run a fresh backup in a fresh/new backup repostitory and then deleted the old backup and respository than use --saftey=none and expect support.

sseago · 2024-11-25T14:26:26Z

@weshayutin So I think the only problems with safety=none is if there were backups run while this happened. In other words, if I run a backup, then shutdown velero, then run maintanenace with safety=none -- if that backup wasn't deleted/expired, then none of its blobs should have been removed -- restoring it should work, and further backups are incremental. Basically full maintenance with safety=none should be similar to what happens with restic maintenance, since restic doesn't have this sort of safety mechanism built in to begin with

In any case, if we did document this, it would not be to suggest that it works unconditionally. We would absolutely need to recommend that velero be shut down while this is done. If I'm understanding things correctly, the potential for harm here is limited to what happens if velero tries to make a new backup while maintenance is running.

@weshayutin "fresh backup/repository" works as long as the user is able to delete all backups in the current repository without any data protection risk. I think the scenario that is most relevant here would be a repository with needed non-expired backups where there was an additional large backup taken that they want to get rid of right now. But maybe that's an edge case we don't need to support. "The only supported solution if there are other backups in the repository that you must retain is to delete the unnecessary backup and wait the time required for regular maintenance to clear it". While today that could be as long as 72 hours for a newly-created backup, once we implement the configurable full maint window, that will drop the worst-case scenario down to 36 hours.

ywk253100 added the Area/Documentation label Nov 8, 2024

ywk253100 assigned Lyndon-Li Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document manual kopia maintenance cleanup with `--safety=none` #8374

Document manual kopia maintenance cleanup with `--safety=none` #8374

kaovilai commented Nov 6, 2024

Lyndon-Li commented Nov 20, 2024

weshayutin commented Nov 22, 2024

sseago commented Nov 25, 2024

Document manual kopia maintenance cleanup with --safety=none #8374

Document manual kopia maintenance cleanup with --safety=none #8374

Comments

kaovilai commented Nov 6, 2024

Lyndon-Li commented Nov 20, 2024

weshayutin commented Nov 22, 2024

sseago commented Nov 25, 2024

Document manual kopia maintenance cleanup with `--safety=none` #8374

Document manual kopia maintenance cleanup with `--safety=none` #8374