Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VMs not being deleted #2

Open
ssilva-soap opened this issue Jan 7, 2025 · 11 comments
Open

VMs not being deleted #2

ssilva-soap opened this issue Jan 7, 2025 · 11 comments

Comments

@ssilva-soap
Copy link

After some time using it, I noticed that some VMs were not deleted after the task ended.
I don't know what information I need to provide to help solve it. I don't even know if it's related to you code or some GCP problem.

@ssilva-soap
Copy link
Author

I checked and two VMs are being created for each workflow.
So when the workflow finish just one VM is being deleted

I'm trying to understand it better

@Tereius
Copy link
Collaborator

Tereius commented Jan 8, 2025

A new VM should be created for each job in a workflow. The VM then processes the assigned job and is deleted when the job is completed. This is the basic procedure. In the github workflow view you, can understand this as follows:

Each newly started job (within a workflow) has a yellow dot (with no spinner around it) - in the screenshot we see two newly started jobs running in parallel.
image

The two VMs are now provisioned in the GCP by the github-runner-autoscaler CloudRun. Have a look into the logs of the github-runner-autoscaler CloudRun - you should see two 'queued' webhook messages:
image
image

It takes about 1 minute until each VM is up and running. Each VM then takes exactly one job (a VM will never take more than one job). As soon as a VM has taken a job, the yellow spinner is displayed, which rotates around the yellow dot - in the screenshots, we can see that both jobs are processed in parallel by two VMs.
image
image

Once the jobs have been processed, the github-runner-autoscaler CloudRun will receive two 'completed' webhook messages:
image
image
Both VMs are then deleted by the github-runner-autoscaler CloudRun.

Please try this minimal workflow to determine if the error persists:

name: test
on: [push]
jobs:
  FirstJob:
    runs-on: self-hosted
    name: "FirstJob"
    steps:
      - run: echo "Hello from VM one"; sleep 60
  SecondJob:
    runs-on: self-hosted
    name: "SecondJob"
    steps:
      - run: echo "Hello from VM two"; sleep 60

Also check if you see any error messages in the github-runner-autoscaler CloudRun logs. You can find more detailed documentation on how it works here: https://github.com/Privatehive/gcp-hosted-github-runner/tree/master/runner-autoscaler

@ssilva-soap
Copy link
Author

ssilva-soap commented Jan 9, 2025

I discovered that two VMs are being created when one workflow with only one task is started.
So only the VM that is used to run the task is being deleted.
That's the evidence:

image

@ssilva-soap
Copy link
Author

I don't know why Github is sending 3 POST to the webhook.
It sends an "action:queued" after sends "actions:waiting" and then "actions:queued" again.
That's why 2 different VMs are being created

@Tereius
Copy link
Collaborator

Tereius commented Jan 9, 2025

Did you maybe register the webhook twice? Maybe one time on the Organization level and one time on the Repository level?

Does your Cloud Run allow concurrency > 1?
image
It is important that each webhook is acknowledged within 10 seconds. Otherwise the webhook will be sent again, resulting in duplicates.

In your logs, I see a workflow job action “waiting” that I have never seen before. You are probably using requried reviews in the workflow. I have never tested this case - which may cause the action 'queued' to be sent twice. Once before the review and once after the review. If this is the case, I should be able to reproduce this case and maybe fix it.

@ssilva-soap
Copy link
Author

I'm using this config:

on:
push:
branches:
- development
pull_request:
branches:
- development

I think Github is triggering the workflow twice under the hood, I don't know.
I will test it only with on trigger to understand if it's the problem and I'll let you know

@Tereius
Copy link
Collaborator

Tereius commented Jan 9, 2025

Okay, I can reproduce it now. When I insert a deployment environment (that requires a review) into the workflow, a “queued” action webhook is sent before the review has actually taken place. Then a second “queued” action webhook is sent after the review. The problem is that we have to ignore the first “queued” webhook, but there is nothing in the webhook payload to detect that a review is pending. The only indicator is that a “waiting” action webhook is sent a few milliseconds after the first “queued” webhook. I'll have to think about a workaround - maybe it's even a GitHub webhook bug that we get a “queued” action webhook before the review.

@ssilva-soap
Copy link
Author

But I'm not using review in this workflow, which is strange.
I'll test some other thing here, and I'll let you know.

@Tereius
Copy link
Collaborator

Tereius commented Jan 11, 2025

I implemented a workaround: if a "waiting" webhook event shortly follows (within 10 seconds) the "queued" event the create_vm cloud task callback will be canceled thus no VM will be created for the first queued event. Please run terraform init -upgrade to update the module to the latest version.

@Tereius
Copy link
Collaborator

Tereius commented Jan 15, 2025

Is it working for you? Can we close the issue?

@ssilva-soap
Copy link
Author

I'm still seeing some VMs being created and turned off after because no job was assigned.
And now I see that VMs are getting longer to be ready when I start more jobs simultaneously, because of the timer you implemented, I think.

But now I understand what you did, and I can personalize the code to meet what I need.
Thanks for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants