Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helm-operator fails to annotate some resources #6472

Closed
lufinima opened this issue Jun 22, 2023 · 9 comments
Closed

helm-operator fails to annotate some resources #6472

lufinima opened this issue Jun 22, 2023 · 9 comments
Assignees
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. triage/needs-information Indicates an issue needs more information in order to work on it.
Milestone

Comments

@lufinima
Copy link

Bug Report

Helm-operator fails to annotate some resources meaning that chart updates will fail.

Description

I've created an helm-operator to deploy nginx ingress controller. The first version of this operator was created using operator-sdk version 1.24.
The command to create the operator was as follows:

operator-sdk init \
  --plugins helm \
  --helm-chart ingress-nginx \
  --helm-chart-repo https://kubernetes.github.io/ingress-nginx \
  --helm-chart-version 4.0.3 \
  --domain helm.k8s.io \
  --group charts \
  --version v1 \
  --kind NginxIngressController

Now I updated operator-sdk to version 1.29 and updated the ingress-nginx to version 4.6.1

operator-sdk init \
  --plugins helm \
  --helm-chart ingress-nginx \
  --helm-chart-repo https://kubernetes.github.io/ingress-nginx \
  --helm-chart-version 4.6.1 \
  --domain helm.k8s.io \
  --group charts \
  --version v1 \
  --kind NginxIngressController

When I try to upgrade the first version of the operator to the second one everything seems to work except that the ingress controller never gets updated, giving the following error while the operator tried to reconcile:

failed to get candidate release: rendered manifests contain a resource that already exists. Unable to continue with update: HorizontalPodAutoscaler "nina-annotation-controller" in namespace "ingress-controller-operator" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "nina-annotation"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "ingress-controller-operator"

So after investigating the issue it seems that the operator doesn't annotate the HorizontalPodAutoscaler resource with

metadata:
  annotations:
    meta.helm.sh/release-name: nina-annotation
    meta.helm.sh/release-namespace: ingress-controller-operator

while for example the Deployment resource gets annotated.

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
    meta.helm.sh/release-name: nina-annotation
    meta.helm.sh/release-namespace: ingress-controller-operator
  creationTimestamp: "2023-06-20T09:33:30Z"

I've discovered this issue of missing annotations in HorizontalPodAutoscaler but might be happening with other resources.

Workaround to minimize bug impact

So what was happening invalidated upgrade of the operator, the only way to bypass the issue and be able to upgrade operator correctly and the ingress controllers as well was to disable autoscaling in my Custom Resource before updating the controller and only after everything getting updated as it should I enabled autoscaling again.

Environment

minikube

minikube version: v1.30.1
commit: 08896fd1dc362c097c925146c4a0d0dac715ace0

minikube setup with

minikube start --cpus 4 --driver=docker --addons ingress --addons ingress-dns --addons metrics-server --kubernetes-version=1.24.8

operator-sdk

operator-sdk version: "v1.29.0", commit: "78c564319585c0c348d1d7d9bbfeed1098fab006", kubernetes version: "1.26.0", go version: "go1.19.9", GOOS: "darwin", GOARCH: "arm64"
@kensipe kensipe added the triage/needs-information Indicates an issue needs more information in order to work on it. label Jun 26, 2023
@kensipe kensipe added this to the Backlog milestone Jun 26, 2023
@horis233
Copy link

horis233 commented Jul 5, 2023

We also observe the same problem when we upgrade operator-sdk from version v1.22 to v1.28. We find this issue happens sometimes. We believe it could be an issue involved in the newer version of operator-sdk.

I am considering if we can revert the operator-sdk version could be a fix of this problem.

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 4, 2023
@lufinima
Copy link
Author

lufinima commented Oct 4, 2023

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 4, 2023
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 2, 2024
@lufinima
Copy link
Author

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 19, 2024
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 19, 2024
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 19, 2024
@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Copy link

openshift-ci bot commented Jun 19, 2024

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci bot closed this as completed Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

5 participants