-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing repo info #41
Comments
I made a short Python script to extract some of the info I was looking for from the MD files. Can be seen here: https://gist.github.com/spyoungtech/3506d5ae9a9888ec709c8fcad33cfc34 Could be extended to parse additional info, I suppose. |
Yeah, this is something I've thought a lot about (and I know a few people have done privately in the past). Ideally, this whole repository would be replaced by a real proper service with an API (and boatloads of historical data), but that's a bit more complicated. 😅 Some of this is something Docker Hub really ought to provide out of the box (IMO), but it's not exactly straightforward for them to do so, I think. |
@tianon yeah, this is more or less the exact thing I set out to do and accomplished :- ) Thanks to this project, I was able to get historical information at least for official docker images. The way registries work doesn't make them, alone, well-suited to the tasks we had in front of us and obviously wouldn't reasonably relate information across registries. Amazon's ECR even times out on certain operations for sufficiently large repositories. Anyhow, I ended up putting the information from this project into a Postgres database. The main use case for my project was to be able to examine docker images and determine the base image(s) used and their historical tags. For example, like this: In [23]: elixer
Out[23]: <Image: elixir@sha256:c3ee088c737bf55150dc5da229ca69e92c5a31eb6ba9da976ade722942c885d3>
In [24]: for base in elixer.bases():
...: print(base)
...: print(base.tags)
debian@sha256:0ba0446bc007a3196501ecbe91aabd4193db81085b23f4a99685448445762396
['10.9', '10', 'buster-20210511', 'buster', 'latest']
erlang@sha256:d5d8e6be8de1b9e7946c6ed1a6278db1a60ae8ee89c0a65e7165238145ff9b54
['24-slim', '24.0-slim', '24.0.1-slim', '24.0.1.0-slim', 'slim'] (Django ORM) So, for example, this particular elixer image was evidently derived from And all the other usual information about an image can be had, exposed through the REST API. I may be able to open source this in the future or rebuild something similar in the open source space. |
@spyoungtech Did you end up publishing this somewhere? |
Thank you so much for your work on this.
I'm wondering if there's a known tool or method to parse the repo info MD files or if the data is available in another format, like JSON.
I'm wanting to parse this data to build a tool to assist docker users in being able to re-constitute old docker builds. For example, by inspecting an image, being able to determine the tag(s)/SHA256 image digest of the upstream base image(s). Any thoughts or suggestions would be greatly appreciated.
The text was updated successfully, but these errors were encountered: