-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elastic Agent Plugin can no longer create new agents after pod restart (or 1 hour) #98
Comments
It's not expected that if it works fine via the Helm chart the first time that it stops working thereafter with no other changes made on your side of things. I assume by "works fine", you mean can create new pods etc and then after the restart nothing works at all, it cannot create new pods etc? There should be many more errors fro logs than that if it cannot create pods etc, I'd have thought. Which Kubernetes version, Kubernetes type (EKS? GKE? a local single node deployment? etc?) and GoCD Helm chart version are you using? |
Thanks for the reply. By works fine I mean that it creates agents pods whenever I trigger the example pipeline in gocd, the pipeline completes and then the agent pod shuts down. I m using EKS and the k8s version is 1.25, the chart version that I m using is the latest one: 2.5.1 This is the full logs that I get from the gocd-server pod, after that it repeats the same error
|
It kind of looks like the service account token you're using somehow doesn't have the permissions for the namespace it's trying to look for agent pods within.
|
OK, I have confirmed that this is because of changes in the way "bound service account tokens" work in Kubernetes What this means is that the "magic" the GoCD Helm chart does on first start to populate the service account token into the "cluster profile" populates your pod with an auto-mounted token bound to the pod. It only does this once. When the pod restarts, this token is invalidated by the Kubernetes API and is no longer valid. It may also not work for longer than 1 hour, since the plugin and GoCD cannot re-read the value. Essentially, this makes the automation in the Helm chart deceptive as it won't work for long. To fix this right now, you need to do something like mentioned in gocd/kubernetes-elastic-agents#328 to create a long-lived token, extract it from Kubernetes and then replace the value in your cluster profile with this. Additionally, if you don't manually set the CA Certificate Data, you might experience a different bug I have fixed in this release included in the latest Helm chart version ( In any case, this needs a better fix as the Helm chart is currently misleading. Will move this issue there as the main problem is with the misleading behaviour of the Helm chart which makes you think you don't need to do anything with the service account tokens to keep it working, since on first install it will work for a while without the user doing anything. |
The root of this problem is now fixed with chart 2.6.0 (which defaults to plugin v4.0.0-505) if you want to give it a go. Note that the server configuration only happens once, so you'd either need to start with a fresh server (probably not ideal), or upgrade the plugin via the Helm chart, and then clear out the older configuration values from the Cluster Profile so the new plugin version falls back to the defaults that the Helm chart has always ensured are set up. |
Thank you very much for the quick fix and the explanation, works perfectly now 👍 |
Hello,
I m using the latest version of this plugin trough the gocd helm chart, when I deploy the helm chart for the first time everything works fine but when I restart the pod (or the pod gets restarted by a helm update) I get the following error
Is this the expected behavior when running in a cluster? do I need to add extra configuration in order to avoid this? Thanks
The text was updated successfully, but these errors were encountered: