-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Misc Dockerfiles updates to reduce image footprints #1363
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Abolfazl Shahbazi <[email protected]>
Dependency Review✅ No vulnerabilities or license issues found.Scanned Files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general this saves too little, please see:
- Analysis (from May): Why containers use hundreds of MBs for Vim/Perl/OpenGL? #225
- => Staged build with other optimizations can halve the image size, 1/4 reduction is not enough
- Example of doing this for all apps, not just one: Use staged builds to minimize final image sizes #1031
- Used to get CI test runs for unified Dockerfiles, as a test-step for using base images
- New shared base image: Add Dockerfile for comps-base image GenAIComps#1127
- Using base image for all apps: Use GenAIComp base image to simplify Dockerfiles & reduce image sizes #1369
libgl1-mesa-glx \ | ||
libjemalloc-dev \ | ||
git | ||
libjemalloc-dev && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why these are installed? Nothing should be using Mesa / 3D libs, and jemalloc usage would need explicit e.g. LD_PRELOAD for it, which I do not see here.
If those are dependencies of some pip package, pip should take care it, shouldn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No unfortunately Pip won't take care of development packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What actually needs these specific packages / why?
I mean, pip install
in that Dockerfile works fine without either of them, application itself is Python / does not link such libs, and I do not see any problem with the app when those libs are missing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tracked from Git history through file renames/moves when these were first used. However, neither the related PRs nor files in them did not mention why these were added. See e.g: #136
There were few earlier Dockerfiles that already apt-get these, that used langchain as base, but they were running something else than ChatQnA. So it seems like these items are just blindly copy-pasted, without any justification, and maybe used just for some development activities, instead of being needed in production.
@ashahba Could you find out whether there is some justification for adding these?
(If yes, I should be included to base image after all, and document the reason there.)
ARG GenAICompsRepo="https://github.com/opea-project/GenAIComps" | ||
ARG GenAICompsBranch="main" # Branch name or Commit ID | ||
|
||
RUN apt-get update -y && apt-get install -y --no-install-recommends --fix-missing \ | ||
curl | ||
|
||
RUN curl -sSL --retry 5 ${GenAICompsRepo}/tarball/${GenAICompsBranch} | tar --strip-components=1 -xzf - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using base image from GenAIComps project is the final goal. These optimization are unnecessary there, and shared base image gives >10x space improvement, when all application images are present.
=> I would skip this and wait for the base image, so that Dockerfiles can be (greatly) simplified to reduce the resulting image size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I compared using curl
against git clone --depth 1
used in my PR: #1031
Both took about 6 secs (through company VPN), which was a bit disappointing.
I would assume curling compressed tarball to use a bit less bandwidth than Git's "packed" data though.
Extracted tarball took 24MB space. Git clone included extra 14MB in .git
subdir, but that would not go not go to final target, it's only on the temporary layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, there's a rather large difference in installing curl
vs. git
on top of python:3.11
.
Building curl image took 9s, and git one 18s, probably due to extra dependencies / their sizes (which won't be in the final image):
$ docker images |grep -E "(git|curl)-test"
localhost/git-test latest d5dfb6922688 2 minutes ago 239 MB
localhost/curl-test latest 0e44588dadfa 3 minutes ago 158 MB
=> I'll add note about doing this change to my PR, if I need to update it again.
COPY --from=devel /usr/local/lib/${PYTHON}/site-packages /usr/local/lib/${PYTHON}/site-packages | ||
COPY --from=devel /usr/local/bin /usr/local/bin | ||
COPY --from=devel /home/user/GenAIComps /home/user/GenAIComps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I tested using separate stage for ChatQnA pip install
, and copying the files from there, like you do here, it saved only couple of MBs. With shared base image, that's pretty insignificant, so I would skip it for simplicity.
Or are you seeing larger savings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please double check your setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a build with just pip install being done in separate step, or not, and image size difference reported by Docker was just few megs. How did you test it, and what different did you see?
(I tested it with the comps base image, but that should not matter as it does exactly the same install step.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When caching is disabled, pip
should clean out any downloading artefacts after installation, so I think only size difference would come from pip & setuptools upgrades, and those are only couple of MB in size. Which corresponds to what I saw...
Are you sure you're not confusing the size difference with disk space used for curl install? apt-get update
itself takes some space, and curl + its deps take ~4.5MB.
(I'm asking because if there is significant size difference, then I should consider same thing for base image.)
|
||
RUN apt-get update -y && apt-get install -y --no-install-recommends --fix-missing \ | ||
ENV LANG C.UTF-8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python base images already include this workaround for Python versions needing it: https://github.com/docker-library/python/blob/master/3.11/slim-bookworm/Dockerfile
(LANG env is removed in 3.13 version.)
So I do not think such override to be needed for the base image. Do you agree?
Description
This PR, leverages multi-stage builds and a few other BKMs to reduce Docker image footprints as much as possible and still sticking with same base images and ensuring functionality remains intact.
Once the two images (one from
main
branch and one using Dockerfile provided in this PR) are built the results shows 20%+ image size reduction:There are a few other enhancements like introducing
build-args
for GenAIComps repo and branch names to be cloned which improves local development and testing experience and is less error prone since it doesn't require modifying the Dockerfiles themselves.My goal is not to get this PR merged, but I like some early feedback review so that we can collect other BKMs into one before going after all Dockerfiles and decide to improve every single one of them.
Issues
This is a known issue.
Type of change
Dependencies
N/A
Tests
This is a WIP PR and there is a CI test to cover it as well.