-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using GPU-optimized NGC images as base for ML (Pytorch/Tensorflow) docker images #457
Comments
Just to note:
|
Yes, I noticed that too just yesterday :D That looks to be built on top of pangeo-docker-images/base-image/Dockerfile Lines 65 to 77 in 7d4e51e
That |
Consolidating some of the discussion @ngam had around using NVIDIA GPU Cloud (NGC) containers as the base image for
pytorch-notebook
andml-notebook
, and potentiallycupy
(#322)Is your feature request related to a problem? Please describe.
For machine learning and data analytics work that rely on NVIDIA Graphical Processing Units (GPUs), there are several optimizations related to drivers/hardware that can help to speed up processing workflows. Currently, the
pytorch-notebook
andml-notebook
docker images rely on CUDA libraries from conda-forge which are less optimized than what exists on NGC.Describe the solution you'd like
Refactor the
pytorch-notebook
andml-notebook
to be based on NGC containers instead of the currentbase-image
. This might involve flipping the current installation pipeline from Pangeo-first/ML-second (base-notebook
->pangeo-notebook
->ml-notebook
) to ML-first/Pangeo-second (ngc
->ml-notebook
->pangeo-notebook
). Something that can help with this is apangeo-notebook
metapackage #359Describe alternatives you've considered
Spin things off into a different repository (
pangeo-gpu-docker-images
?), or have a separate build chain (ngc-pytorch-notebook
,ngc-ml-notebook
) from the current CI/CD infrastructure.Additional context
Add any other context or screenshots about the feature request here.
One benefit of chaging the build order to ML-first/Pangeo-second is that ML folks who don't need all of the heavy Climate/Ocean packages
pangeo-notebook
can get a slimmerml-notebook
. For example, if they're deploying a model to some server API, they can base their docker image onngc-ml-notebook
, instead of the current heavyml-notebook
.Disadvantage is that the refactoring will require some effort, and we need to be careful to ensure this doesn't affect existing JupyterHub deployments.
The text was updated successfully, but these errors were encountered: