-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VMs not being deleted #2
Comments
I checked and two VMs are being created for each workflow. I'm trying to understand it better |
A new VM should be created for each job in a workflow. The VM then processes the assigned job and is deleted when the job is completed. This is the basic procedure. In the github workflow view you, can understand this as follows: Each newly started job (within a workflow) has a yellow dot (with no spinner around it) - in the screenshot we see two newly started jobs running in parallel. The two VMs are now provisioned in the GCP by the github-runner-autoscaler CloudRun. Have a look into the logs of the github-runner-autoscaler CloudRun - you should see two 'queued' webhook messages: It takes about 1 minute until each VM is up and running. Each VM then takes exactly one job (a VM will never take more than one job). As soon as a VM has taken a job, the yellow spinner is displayed, which rotates around the yellow dot - in the screenshots, we can see that both jobs are processed in parallel by two VMs. Once the jobs have been processed, the github-runner-autoscaler CloudRun will receive two 'completed' webhook messages: Please try this minimal workflow to determine if the error persists:
Also check if you see any error messages in the github-runner-autoscaler CloudRun logs. You can find more detailed documentation on how it works here: https://github.com/Privatehive/gcp-hosted-github-runner/tree/master/runner-autoscaler |
I don't know why Github is sending 3 POST to the webhook. |
Did you maybe register the webhook twice? Maybe one time on the Organization level and one time on the Repository level? Does your Cloud Run allow concurrency > 1? In your logs, I see a workflow job action “waiting” that I have never seen before. You are probably using requried reviews in the workflow. I have never tested this case - which may cause the action 'queued' to be sent twice. Once before the review and once after the review. If this is the case, I should be able to reproduce this case and maybe fix it. |
I'm using this config: on: I think Github is triggering the workflow twice under the hood, I don't know. |
Okay, I can reproduce it now. When I insert a deployment environment (that requires a review) into the workflow, a “queued” action webhook is sent before the review has actually taken place. Then a second “queued” action webhook is sent after the review. The problem is that we have to ignore the first “queued” webhook, but there is nothing in the webhook payload to detect that a review is pending. The only indicator is that a “waiting” action webhook is sent a few milliseconds after the first “queued” webhook. I'll have to think about a workaround - maybe it's even a GitHub webhook bug that we get a “queued” action webhook before the review. |
But I'm not using review in this workflow, which is strange. |
I implemented a workaround: if a "waiting" webhook event shortly follows (within 10 seconds) the "queued" event the create_vm cloud task callback will be canceled thus no VM will be created for the first queued event. Please run |
Is it working for you? Can we close the issue? |
I'm still seeing some VMs being created and turned off after because no job was assigned. But now I understand what you did, and I can personalize the code to meet what I need. |
After some time using it, I noticed that some VMs were not deleted after the task ended.
I don't know what information I need to provide to help solve it. I don't even know if it's related to you code or some GCP problem.
The text was updated successfully, but these errors were encountered: