Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve dependencies versioning and ci #141

Merged
merged 16 commits into from
Jan 21, 2025

Conversation

LeoGrin
Copy link
Collaborator

@LeoGrin LeoGrin commented Jan 16, 2025

  • Adds minimum (found on my mac M3 using py3.9 and py3.10, all <= 2022 expect torch which I didn't change) and maximum version to all the dependencies. This prevent failures when a dependency makes a breaking change
  • Add CI: dependabot to update the max dependency, and some actions to run the tests on the minimum and maximum versions of the dependencies (on mac and ubuntu).
  • Run the pre-commit on all files

@LeoGrin LeoGrin force-pushed the improve_dependencies_versioning_and_ci branch from 4b923a0 to 55aefe5 Compare January 16, 2025 17:07
@LeoGrin
Copy link
Collaborator Author

LeoGrin commented Jan 16, 2025

@noahho I'm pushing the maximum python version from 3.11 to 3.12 (the tests are passing). Is there any reason you set it to 3.11 instead of 3.12?

@LeoGrin LeoGrin requested a review from eddiebergman January 16, 2025 17:35
@LeoGrin LeoGrin mentioned this pull request Jan 20, 2025
@LeoGrin LeoGrin requested review from LennartPurucker and removed request for eddiebergman January 21, 2025 12:20
Copy link
Collaborator

@LennartPurucker LennartPurucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small comments, other LGTM

.github/workflows/pull_request.yml Outdated Show resolved Hide resolved
pyproject.toml Outdated
Comment on lines 9 to 15
"torch>=2.1,<=2.5.1",
"scikit-learn>=1.2.0,<=1.6.1",
"typing_extensions>=4.4.0,<=4.12.2",
"scipy>=1.7.3,<=1.15.1",
"pandas>=1.4.0,<=2.2.3",
"einops>=0.2.0,<=0.8.0",
"huggingface-hub>=0.0.1,<=0.27.1",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these the current max versions or the end of our support versions?
IMO, as long as we do not have an upper limit, we should not define one.

But I think this is up to the taste of the maintainer and boils to one of the following choices:

  • Update this file every time any of the unlimited packages are updated, or
  • update this file only when something breaks

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the current max versions. My thinking was that it would prevent failures for users, and dependabot would update these max versions weekly (and then we could merge in one click if it works, otherwise investigate).
But yeah I'm not sure, not specifying the max version does reduce the maintainer work a little bit, and could be combined with some other github action which checks that things are working with the latest version.

Copy link
Collaborator

@eddiebergman eddiebergman Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to chime in:

Both options have tradeoffs and I want to be a bit more explicit about them:

  • No upperbound (Max compatibility with other libs, more immediate issues):

You effectively are at the whim of the lib to not break their API. This rarely happens with well maintained libraries except in X changes of a version (X.Y.Z, see semvar). However it will eventually happen, depending on how much functionality you use of a library. For example, with numpy, they're very unlikely to change how np.add works and if that's all you use, then you shouldn't need to upperbound it. However if you were to rely on numpy's C bindings, then they may change it as they did in the bump to 2.0.0. The upside of this is that if a user installs a higher version and it just works, then they don't have to come ask you to bump it.

  • Explicit Upperbound (Limited compatibility, less immediate issues):

You effectively specify a set of dependencies which guarantee you library works (with cave-eats such as pre-existing bugs in existing versions within bounds). The downside here is that if the user uses a library that needs a library version with a lower bound such as 2.6.0 and you upperbound on 2.5.0, then they can not use your library alongside their other one, and you will likely get a request to up it at some point.

So the tradeoff is when do you want to patch a version change? With no upperbound, it will happen immediately for a user when an issue occurs and they will raise an issue, at which point you recommend downgrading the conflict (if they can, same problem you get with Explicit upperbound but harder to debug.

With Explicit upperbound, you save the user from unknown dependancy hell, replacing it with known dependancy hell. You also save yourself immediate dependency management, but as things go with time, eventually your upperbounds become the lowerbounds of the eco-system, and you will have to solve them bit by bit. You also prevent perfectly compatible libraries from working when they otherwise would.


Given that, my recommendation and strategy is use your vibe-checks to know which libraries to upperbound and which to not. For example, I would very comfortably set numpy to have an upperbound on X, i.e. numpy<3, as they are unlikely to break anything anytime soon. Likewise with torch, but you need to consider that torch is usually a lot more unstable due to the hardware eco-system they build upon and the functionality you utilize from torch, such as their attention mechanisms. There are some libraries like typing_extensions which are developed by Python themselves and (almost) never backwards breaking, as the library is pure python and doesn't rely on CPython or otherwise system-level libraries. I would never upper-bound this.

If you were to use something like SMAC (sorry SMAC) or otherwise less production grade libraries, then they are unlikely to follow semvar strictly, and from version to version, they could break things unknowingly, usually not intentionally. These are often the hardest to judge as if several popular libraries depend on it, each of them distrusting it and setting a relatively strict upperbound, this will cause lots of conflicts.

Other considerations is the frequency of library updates. Things like pyyaml are basically complete, in that it is unlikely to update, meaning a stricter upperbound isn't going to cause as much dep maintanence as a package which updates extremely frequently.

---lock files

Recommendation: Upperbound libraries like numpy and torch to their next X. If you are unsure of the development practices of a dependancy, upperbound on their Y. Never upper bound on a Z unless there's a history of issues.

If you want to provide a this setup works assuming no other dependancies, such as in a Docker environment for deployment, then you provide lock files which specify exact dependancies.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @eddiebergman, really interesting!! Based on your comments I made the max versions much laxer: no bound for typing-extension, bound on X for all the rest except scikit-learn and einops, for which bound on Y. Tell me if you disagree! I would say that using these lax upper bounds + dependabot is a good balance between making life easy for maintainers while being robust for users.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very well put into words @eddiebergman!

I agree with going forward like this and like the reasoning.
One more comment: I think it is also useful to no upperbound to push ourselves to support higher versions.

@LeoGrin LeoGrin merged commit 7529757 into main Jan 21, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants