-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runner pod ephemerality with emptyDir #481
Comments
The right way to have ephemeral pods is to use the ephemeral flag on the pod: https://github.com/myoung34/docker-github-actions-runner#environment-variables - these pod will then start up, run their job, and after they finish they should get into status Completed and eventually deleted. to control the scaling (and thus avoid running out of resources, you set the: maxRunners: 18
minRunners: 0 fields in the CR.
|
Hi @davidkarlsen, thanks for your response. I have the scaling configured in my deployment, and it works fine when using the apiVersion: garo.tietoevry.com/v1alpha1
kind: GithubActionRunner
metadata:
name: runner-poolsandbox
namespace: github-actions-runner-operator
spec:
minRunners: 1
maxRunners: 6
organization: jugo-io
reconciliationPeriod: 1m
tokenRef:
key: GH_TOKEN
name: actions-runner
podTemplateSpec:
metadata:
annotations:
prometheus.io/scrape: 'false'
prometheus.io/port: '3903'
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
topologyKey: kubernetes.io/hostname
labelSelector:
matchExpressions:
- key: garo.tietoevry.com/pool
operator: In
values:
- runner-poolsandbox
containers:
- name: runner
env:
- name: RUNNER_DEBUG
value: 'true'
- name: DOCKER_TLS_CERTDIR
value: /certs
- name: DOCKER_HOST
value: 'tcp://localhost:2376'
- name: DOCKER_TLS_VERIFY
value: '1'
- name: DOCKER_CERT_PATH
value: /certs/client
- name: GH_ORG
value: jugo-io
- name: RUNNER_SCOPE
value: org
- name: ORG_NAME
value: jugo-io
- name: ACCESS_TOKEN
valueFrom:
secretKeyRef:
name: actions-runner
key: GH_TOKEN
- name: ACTIONS_RUNNER_INPUT_LABELS
value: sandbox
- name: LABELS
value: 'self-hosted,sandbox'
- name: ACTIONS_RUNNER_INPUT_EPHEMERAL
value: 'true'
- name: EPHEMERAL
value: 'true'
envFrom:
- secretRef:
name: runner-poolsandbox-regtoken
image: 'quay.io/evryfs/github-actions-runner:myoung34-derivate'
imagePullPolicy: IfNotPresent
resources: {}
volumeMounts:
- mountPath: /certs
name: docker-certs
- mountPath: /home/runner/_diag
name: runner-diag
- mountPath: /home/runner/_work
name: runner-work
- name: docker
env:
- name: DOCKER_TLS_CERTDIR
value: /certs
image: 'docker:stable-dind'
imagePullPolicy: Always
args:
- '--mtu=1430'
resources: {}
securityContext:
privileged: true
volumeMounts:
- mountPath: /var/lib/docker
name: docker-storage
- mountPath: /certs
name: docker-certs
- mountPath: /home/runner/_work
name: runner-work
- name: exporter
image: 'quay.io/evryfs/github-actions-runner-metrics:v0.0.3'
ports:
- containerPort: 3903
protocol: TCP
volumeMounts:
- name: runner-diag
mountPath: /_diag
readOnly: true
volumes:
- emptyDir: {}
name: runner-work
- emptyDir: {}
name: runner-diag
- emptyDir: {}
name: mvn-repo
- emptyDir: {}
name: docker-storage
- emptyDir: {}
name: docker-certs With this config, the pod starts, job runs, and runner container restarts and the pod remains. It never enters Completed state. I think it's because of the problems in the operator logs, it doesn't seem to be able to scale/reconcile the pod for some reason using the myoung34 derivate image. |
Here's the behaviour captured from the runner when using myoung34 image and setting EPHEMERAL in the env vars:
As you can see, it restarts the runner container but the pod does not go in to The whole time I was getting the log messages on the operator
Let me know if you need anything else |
So I have reprovisioned the cluster as there seemed to be some lingering resources with bad configuration breaking things. The runner container is restarting and that seems to clear down the
I read through the code briefly (not very experienced with go) but looks like its because this isn't returning true: github-actions-runner-operator/controllers/podrunner_types.go Lines 73 to 75 in 43c51db
We have one pod running at the moment but I'm wondering if the myoung34 derivate image is missing something potentially which stops the operator being able to recognise it as a runner pod or something? |
Hi @davidkarlsen any update on this? |
@joshrichards37 |
@davidkarlsen I am having the same issue, running the myoung34 derivate image, operator isn't recognizing the github runner and not able to scale the runners. |
Hi there,
I am in the process of implementing the operator in our k8s cluster, and everything has been great and straight forward so far.
I just have a question around ephemerality of the pods. I have tried using the myoung34 derivate of the container image and passing the
EPHEMERAL
env var through, and this does seem to restart the runner container which is great however it does not restart the pod, which means the emptyDir volumes don't get recreated and persist on the cluster node.Using the myoung34 derivate also doesn't seem to work with the runner reconciliation meaning that the autoscaling isn't working for me right now using the derivate, here are some logs when using the derivate:
When I have been running some tests using the master image, it seems that the behaviour is:
This is great if we don't have many jobs waiting to be processed however sometimes we have 10s of jobs waiting to be processed and don't want to run the risk of running out of disk space on our cluster nodes. We are looking at implementing karpenter in the future to handle the scaling of cluster nodes but don't have the time right now to do so.
Is there a way right now to make the master image behave in an ephemeral way by recreating the pod and emptyDirs when the job has finished?
Thanks in advance
The text was updated successfully, but these errors were encountered: