Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy container pull (aka only download image when it is really needed) #1745

Closed
igorgatis opened this issue Mar 6, 2021 · 9 comments
Closed
Labels
Can Close? Will close in 30 days unless there is a comment indicating why not

Comments

@igorgatis
Copy link

igorgatis commented Mar 6, 2021

I'd like to be able to list several docker images as part of my WORKSPACE but only download the actual image if the target I'm building needs it.

Right now, if you list a container image as part of the WORKSPACE, they will be download even for bazel queries.

How hard is it to have a lazy version of container_pull?

This is a problem in my company. We have this monorepo with several large container images. Even though each individual developer spends most of his/her time targeting a specific container, eventually they need to download everything, for example, when pre submit hook performs a bazel query deps to figure out which tests are affected. This has been a growing pain.

@igorgatis
Copy link
Author

igorgatis commented Mar 8, 2021

Good news: I have a working prototype. The general idea is rather simple:

  • Modified puller by adding a -metadata-only flag which skips layers download.
  • Introduced a lazy_download parameter to container_pull which is false by default.
  • When lazy_download is true, puller downloads only the metadata and adds a genrule to BUILD file.

Generated BUILD file looks like this:

package(default_visibility = ["//visibility:public"])
load("@io_bazel_rules_docker//container:import.bzl", "container_import")

genrule(
    name = "download",
    message = "Fetching",
    outs = ["000.tar.gz", "001.tar.gz", "002.tar.gz", "003.tar.gz", "004.tar.gz", "005.tar.gz", "006.tar.gz", "000.sha256", "001.sha256", "002.sha256", "003.sha256", "004.sha256", "005.sha256", "006.sha256"],
    cmd = "$(location @io_bazel_rules_docker//container/go/cmd/puller:puller) -directory $(RULEDIR) -os linux -os-version  -os-features  -architecture amd64 -variant  -features -name someregistry/elasticsearch/elasticsearch:v1 -timeout 6000",
    tools = ["@io_bazel_rules_docker//container/go/cmd/puller:puller"],
)

container_import(
    name = "image",
    config = "config.json",
    layers = ["000.tar.gz", "001.tar.gz", "002.tar.gz", "003.tar.gz", "004.tar.gz", "005.tar.gz", "006.tar.gz"],
    base_image_registry = "someregistry",
    base_image_repository = "elasticsearch/elasticsearch",
    base_image_digest = "sha256:6c36fa585104d28bba9e53c799a4e20058445476cadb3b3d3e789d3793eed10a",
    tags = [""],
)

exports_files(["image.digest", "digest"])

When lazy_download is false, here is the generated BUILD file:

package(default_visibility = ["//visibility:public"])
load("@io_bazel_rules_docker//container:import.bzl", "container_import")

container_import(
    name = "image",
    config = "config.json",
    layers = ["000.tar.gz", "001.tar.gz", "002.tar.gz", "003.tar.gz", "004.tar.gz", "005.tar.gz", "006.tar.gz"],
    base_image_registry = "someregistry",
    base_image_repository = "elasticsearch/elasticsearch",
    base_image_digest = "sha256:6c36fa585104d28bba9e53c799a4e20058445476cadb3b3d3e789d3793eed10a",
    tags = [""],
)

exports_files(["image.digest", "digest"])

Here is the list of changed files:

 container/go/cmd/puller/puller.go | 15 +++++++++++----
 container/go/pkg/compat/write.go  |  6 +++---
 container/pull.bzl                | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++---------
 3 files changed, 65 insertions(+), 16 deletions(-)

It would be a HUGE time saver and pain reliever for my development team. Would you guys consider this a useful feature worth sending a PR for?

@igorgatis
Copy link
Author

igorgatis commented Mar 8, 2021

Here is the PR #1749

@igorgatis igorgatis changed the title Lazy container load (aka only download image when it is really needed) Lazy container pull (aka only download image when it is really needed) Mar 10, 2021
@smukherj1
Copy link
Collaborator

FYI @alexeagle @pcj @gravypod

Thanks for filing the issue @igorgatis and sorry for the delay getting back here. We have some new community maintainers for rules_docker (cc'd) ramping up and we should be able to start addressing open issues and PRs within a few weeks.

The major blocker right now is CI setup for e2e tests need to be fixed and they are crucial to validate puller & pusher functionality. Once e2e tests are up and running again, reviewing your PR should be unblocked.

@igorgatis
Copy link
Author

Any news?

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days.
Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_docker!

@github-actions github-actions bot added the Can Close? Will close in 30 days unless there is a comment indicating why not label Jan 10, 2022
@dashjay
Copy link

dashjay commented Jan 10, 2022

Can we push some container_layers directly and define a func container_manifest, so that we can commit a image like git rebase

@github-actions github-actions bot removed the Can Close? Will close in 30 days unless there is a comment indicating why not label Jan 11, 2022
@dashjay
Copy link

dashjay commented Jan 24, 2022

I'd like to provide a tool named 'image_rebaser', we can call it like image_rebaser base:tag target:tag layer1.tar layer2.tar ..... this tool just push all tars as blobs, and commit a manifest. (we need copy all base image to target repository if base image blob not exists in target repository, or can not invoke the mounted).

but how to make it compatible with container_bundle or containerd_push. it's a troublesome thing.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days.
Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_docker!

@github-actions github-actions bot added the Can Close? Will close in 30 days unless there is a comment indicating why not label Jul 24, 2022
@github-actions
Copy link

This issue was automatically closed because it went 30 days without a reply since it was labeled "Can Close?"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Can Close? Will close in 30 days unless there is a comment indicating why not
Projects
None yet
Development

No branches or pull requests

3 participants