Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After upgrade to 1.15, volumes are not excluded but PVC's are, with the exclude label set #8510

Closed
jsilvela opened this issue Dec 12, 2024 · 7 comments · Fixed by #8572
Closed

Comments

@jsilvela
Copy link

jsilvela commented Dec 12, 2024

What steps did you take and what happened:
This has been detected as a regression in our automated test suite after upgrading from 1.14 to 1.15.

We have a set of pods with their PVCs, some of which we label with velero.io/exclude-from-backup=true.
Since the upgrade to 1.15, all the volumes are being backed up, even those for the annotated PVCs.

What did you expect to happen:
The volumes corresponding to PVCs with the velero.io/exclude-from-backup=true should not have been backed up.

The following information will help us better understand what's going on:

From the velero logs, we can see that the PVC cluster-velero-e2e-3 is being excluded.

    time="2024-12-10T08:38:37Z" level=info
.   msg="Excluding item because it has label velero.io/exclude-from-backup=true"
.   backup=velero/cluster-velero-e2e-backup-1 logSource="pkg/backup/item_backupper.go:115"
.   name=cluster-velero-e2e-3 namespace=velero-e2e-8546
.   resource=persistentvolumeclaims

However, the corresponding volumes are being backed up.

    time="2024-12-10T08:38:37Z" level=info
.   msg="Backing up item"
.   backup=velero/cluster-velero-e2e-backup-1 logSource="pkg/backup/item_backupper.go:184"
.   name=pvc-cd84b2c0-c8ee-41f2-98ea-e4b803d21351 namespace= resource=persistentvolumes

    time="2024-12-10T08:38:37Z" level=info
.   msg="Executing takePVSnapshot"
.   backup=velero/cluster-velero-e2e-backup-1
.   logSource="pkg/backup/item_backupper.go:549"
.   name=pvc-cd84b2c0-c8ee-41f2-98ea-e4b803d21351 namespace= resource=persistentvolumes

And in the backup describe we can see there are 6 volumes, where there should only be 2:

    Backup Volumes:
      Velero-Native Snapshots:
        pvc-6db130db-fe12-4a8c-a7eb-c5eb5b9b7641: specify --details for more information
        pvc-25019cca-05ba-4f9e-919e-39dcc4ae172a: specify --details for more information
        pvc-6b2024a6-34cd-4aed-9cca-54387a024162: specify --details for more information
        pvc-60febcb2-4621-4c73-81e9-77935226bdb7: specify --details for more information
        pvc-68ede3fe-b20c-486f-8681-0ca8c0145bc4: specify --details for more information
        pvc-22333a02-55c3-45bd-bada-438732491e19: specify --details for more information

Anything else you would like to add:

Environment:

  • Velero version (use velero version): 1.15.0 (plus AWS Plugin v1.11.0)
  • Velero features (use velero client config get features): features: <NOT SET>
  • Kubernetes version (use kubectl version): 1.28 through 1.31
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration: Amazon EKS
  • OS (e.g. from /etc/os-release):

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@blackpiglet
Copy link
Contributor

Could you give more information about your scenario?
Is the EnableCSI feature enabled?
It's better to upload the debug bundle to help investigate. The CLI is velero debug.

@jsilvela
Copy link
Author

jsilvela commented Dec 30, 2024

Thanks @blackpiglet , sorry for the delay (vacation + release work).
No we don't activate the EnableCSI feature.
As to the debug bundle, let me get back to you with it.

@jsilvela
Copy link
Author

I've attached the velero debug.
Also ran velero client config get features which I had not up to now. Result: features: <NOT SET>

BTW just realized where I had seen your avatar before. Nice!

bundle-2024-12-30-12-59-47.tar.gz

@blackpiglet
Copy link
Contributor

velero/pkg/backup/backup.go

Lines 645 to 661 in 200435b

for _, item := range itemBlock.Items {
if item.Gr == kuberesource.Pods {
metadata, key, err := kb.itemMetadataAndKey(item)
if err != nil {
itemBlock.Log.WithError(errors.WithStack(err)).Error("Error accessing pod metadata")
continue
}
// Don't run hooks if pod is excluded
if !itemBlock.itemBackupper.itemInclusionChecks(itemBlock.Log, false, metadata, item.Item, item.Gr) {
continue
}
// Don't run hooks if pod has already been backed up
if _, exists := itemBlock.itemBackupper.backupRequest.BackedUpItems[key]; !exists {
preHookPods = append(preHookPods, item)
}
}
}

IMO, this is related to the v1.15.0 introduced the ItemBlock, which is used to support parallel backup.
The backupItemBlock only just the label on the Pod resource.

In your scenario, the Pods and PVCs have the excluded label, and the PVs don't.
@sseago

@sseago
Copy link
Collaborator

sseago commented Jan 2, 2025

@blackpiglet This code is only related to running hooks (the ItemBlock change was that we run pre-hooks, then call BackupItem on each item in the block, then run post hooks). So the excluding of items should be handled within the BackupItem func as before. I don't think any of that changed as part of the ItemBlock work, but there may have been some other 1.5 work that modified the way the exclude filtering is done.

@sseago
Copy link
Collaborator

sseago commented Jan 2, 2025

@blackpiglet I think I see what's happening, though. It is related to the ItemBlock change but not due to the code quoted above. In the scenario here, the PVC has velero.io/exclude-from-backup=true but the PV does not. By default, PVs are only included if they're returned as additional items in a plugin. So in 1.14, PV is returned by PVC's BIA -- since PVC is excluded, PV won't be pulled in. However, with the ItemBlock code, we also pull items in via the ItemBlockAction plugins.

I think the fix here is to move that itemInclusionChecks in line 653 to where we're building up the item blocks. We shouldn't even be adding excluded items to ItemBlocks. That way the PVC won't be added to the ItemBlock, which means we won't call the PVCs ItemBlockAction plugin that pulls in the (non-annotated) PV, which will preserve 1.14 behavior here.

I'll reassign this to myself and get a fix out soon.

@sseago
Copy link
Collaborator

sseago commented Jan 2, 2025

@blackpiglet @jsilvela #8572 should fix the issue. I haven't tested the fix yet -- I need to rebuild my cluster tomorrow and then I'll test the change and report back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants