-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Pandas version 2 #1306
Comments
Technically speaking, Dask-CUDA has no compatibility issues with pandas 2, but for that to be useful you'll also need cuDF to support it and there's ongoing work for that, see rapidsai/cudf#13535, which #1213 is waiting for. We are also happy to accept PRs in both Dask-CUDA and cuDF to expand support for libraries that our users need. |
Thank you. I suppose, for this repo, just removing the version constraint in pyproject.toml will help a lot. Currently, that constraint stops us from using Pandas 2 in our dask job at all, even if we don't use cuDF. |
I do not necessarily oppose but I do have mixed feelings about this. On the one end I understand your ask, but ultimately Dask-CUDA is primarily meant to be used with GPU libraries, which in this case in particular implies cuDF. Removing the pin would loosely communicate "we support pandas 2 already" which is not true because we can't test it yet. @galipremsagar @shwina @rjzamora @quasiben do you have thoughts on this? Perhaps the current cuDF pin to In any case, the most recent plan is to have pandas 2 support in 24.04, which is due early April. |
Thank you for your reply and thank you for considering the request.
I am not sure if that is the characterization everyone uses for Dask-CUDA currently, especially if you consider that cuDF is not even a listed dependency of Dask-CUDA. For example, we use Dask-CUDA for only |
I think it's fine to rely on cudf's upper bound for pandas. dask-cuda users who aren't using cudf should be free to use newer versions of pandas if it works for them. |
And this is actually now blocking cudf's ability to test our pandas 2 support with dask, so I'm going to go ahead and open a PR to lift this constraint. Let's hope using the latest pandas doesn't break any of dask-cuda's own tests! |
No worries - I'll be happy to investigate anything that breaks :) |
dask-cuda uses pandas for some tests, but the main reason for the pinning is that it is inherited from RAPIDS libraries (mainly cudf) that do not yet support pandas 2.0 and are the primary use case for dask-cuda. However, there is no reason dask-cuda cannot be used in other contexts, so relaxing this constraint makes sense. Resolves #1306 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Lawrence Mitchell (https://github.com/wence-) - Ray Douglass (https://github.com/raydouglass) URL: #1308
Thanks @vyasr for taking care of this during my absence. |
This was resolved by #1308 , closing. |
Thank you everyone for such a prompt resolution. |
dask-cuda uses pandas for some tests, but the main reason for the pinning is that it is inherited from RAPIDS libraries (mainly cudf) that do not yet support pandas 2.0 and are the primary use case for dask-cuda. However, there is no reason dask-cuda cannot be used in other contexts, so relaxing this constraint makes sense. Resolves rapidsai#1306 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Lawrence Mitchell (https://github.com/wence-) - Ray Douglass (https://github.com/raydouglass) URL: rapidsai#1308
Pandas 2.0.0 was released in April 2023. We should spend some effort to make this project compatible with the 2.y versions. Pandas 3 also has a dev release out, so maybe we can try for that as well now.
The text was updated successfully, but these errors were encountered: