-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve dependencies versioning and ci #141
Conversation
LeoGrin
commented
Jan 16, 2025
•
edited
Loading
edited
- Adds minimum (found on my mac M3 using py3.9 and py3.10, all <= 2022 expect torch which I didn't change) and maximum version to all the dependencies. This prevent failures when a dependency makes a breaking change
- Add CI: dependabot to update the max dependency, and some actions to run the tests on the minimum and maximum versions of the dependencies (on mac and ubuntu).
- Run the pre-commit on all files
4b923a0
to
55aefe5
Compare
@noahho I'm pushing the maximum python version from 3.11 to 3.12 (the tests are passing). Is there any reason you set it to 3.11 instead of 3.12? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two small comments, other LGTM
pyproject.toml
Outdated
"torch>=2.1,<=2.5.1", | ||
"scikit-learn>=1.2.0,<=1.6.1", | ||
"typing_extensions>=4.4.0,<=4.12.2", | ||
"scipy>=1.7.3,<=1.15.1", | ||
"pandas>=1.4.0,<=2.2.3", | ||
"einops>=0.2.0,<=0.8.0", | ||
"huggingface-hub>=0.0.1,<=0.27.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these the current max versions or the end of our support versions?
IMO, as long as we do not have an upper limit, we should not define one.
But I think this is up to the taste of the maintainer and boils to one of the following choices:
- Update this file every time any of the unlimited packages are updated, or
- update this file only when something breaks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the current max versions. My thinking was that it would prevent failures for users, and dependabot would update these max versions weekly (and then we could merge in one click if it works, otherwise investigate).
But yeah I'm not sure, not specifying the max version does reduce the maintainer work a little bit, and could be combined with some other github action which checks that things are working with the latest version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to chime in:
Both options have tradeoffs and I want to be a bit more explicit about them:
- No upperbound (Max compatibility with other libs, more immediate issues):
You effectively are at the whim of the lib to not break their API. This rarely happens with well maintained libraries except in X
changes of a version (X.Y.Z
, see semvar). However it will eventually happen, depending on how much functionality you use of a library. For example, with numpy
, they're very unlikely to change how np.add
works and if that's all you use, then you shouldn't need to upperbound it. However if you were to rely on numpy's C bindings, then they may change it as they did in the bump to 2.0.0
. The upside of this is that if a user installs a higher version and it just works, then they don't have to come ask you to bump it.
- Explicit Upperbound (Limited compatibility, less immediate issues):
You effectively specify a set of dependencies which guarantee you library works (with cave-eats such as pre-existing bugs in existing versions within bounds). The downside here is that if the user uses a library that needs a library version with a lower bound such as 2.6.0
and you upperbound on 2.5.0
, then they can not use your library alongside their other one, and you will likely get a request to up it at some point.
So the tradeoff is when do you want to patch a version change? With no upperbound, it will happen immediately for a user when an issue occurs and they will raise an issue, at which point you recommend downgrading the conflict (if they can, same problem you get with Explicit upperbound but harder to debug.
With Explicit upperbound, you save the user from unknown dependancy hell, replacing it with known dependancy hell. You also save yourself immediate dependency management, but as things go with time, eventually your upperbounds become the lowerbounds of the eco-system, and you will have to solve them bit by bit. You also prevent perfectly compatible libraries from working when they otherwise would.
Given that, my recommendation and strategy is use your vibe-checks to know which libraries to upperbound and which to not. For example, I would very comfortably set numpy to have an upperbound on X
, i.e. numpy<3
, as they are unlikely to break anything anytime soon. Likewise with torch
, but you need to consider that torch
is usually a lot more unstable due to the hardware eco-system they build upon and the functionality you utilize from torch
, such as their attention mechanisms. There are some libraries like typing_extensions
which are developed by Python
themselves and (almost) never backwards breaking, as the library is pure python and doesn't rely on CPython or otherwise system-level libraries. I would never upper-bound this.
If you were to use something like SMAC
(sorry SMAC) or otherwise less production grade libraries, then they are unlikely to follow semvar strictly, and from version to version, they could break things unknowingly, usually not intentionally. These are often the hardest to judge as if several popular libraries depend on it, each of them distrusting it and setting a relatively strict upperbound, this will cause lots of conflicts.
Other considerations is the frequency of library updates. Things like pyyaml
are basically complete, in that it is unlikely to update, meaning a stricter upperbound isn't going to cause as much dep maintanence as a package which updates extremely frequently.
---lock
files
Recommendation: Upperbound libraries like numpy
and torch
to their next X
. If you are unsure of the development practices of a dependancy, upperbound on their Y
. Never upper bound on a Z
unless there's a history of issues.
If you want to provide a this setup works assuming no other dependancies, such as in a Docker environment for deployment, then you provide lock
files which specify exact dependancies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @eddiebergman, really interesting!! Based on your comments I made the max versions much laxer: no bound for typing-extension
, bound on X for all the rest except scikit-learn
and einops
, for which bound on Y. Tell me if you disagree! I would say that using these lax upper bounds + dependabot is a good balance between making life easy for maintainers while being robust for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very well put into words @eddiebergman!
I agree with going forward like this and like the reasoning.
One more comment: I think it is also useful to no upperbound to push ourselves to support higher versions.