Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document elastic cluster scaling down #41

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

guillaumeeb
Copy link
Member

Pending some questions to @micafer.

Copy link
Collaborator

@sebastian-luna-valero sebastian-luna-valero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestions below.

An additional question I had. After deploying daskhub I usually see user-scheduler and image-cleaner pods sitting on worker nodes indefinitely. I guess that's by design, but I wonder whether that's going to prevent the cluster from elastic shrinking?

Thanks!

EGI.md Outdated Show resolved Hide resolved
EGI.md Outdated Show resolved Hide resolved
Co-authored-by: Sebastian Luna-Valero <[email protected]>
@guillaumeeb
Copy link
Member Author

Thanks for the review, so I need to dig a bit deeper after @micafer answer:

There are some pods that are created using a K8s typed called DaemonSet, in this case there will be one pod deployed in each available node.
CLUES ignores this pods to mark a node as "used", so in nodes 2 and 3 there will be some other pods that CLUES cannot ignore.
So you can try to "pack" the pods into one node, using the comands "kubectl drain" and "kubectl cordon" to free the nodes.

@sebastian-luna-valero
Copy link
Collaborator

Do you have code/notebook with the workload? So I can rerun on my end and check whether I can help as well.
Thanks!

@guillaumeeb
Copy link
Member Author

Sure, I'm just using the notebook from this repo: import package part and then just jump to Setup Dask gateway cluster section.

Just use some bigger number for Dask worker memory, and scale a bit more:

cluster = gateway.new_cluster(worker_memory=8, worker_cores=2)
cluster.scale(18)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants