Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental RAPIDS cuDF Backend for Velox #12412

Open
bdice opened this issue Feb 20, 2025 · 0 comments
Open

Experimental RAPIDS cuDF Backend for Velox #12412

bdice opened this issue Feb 20, 2025 · 0 comments
Labels
enhancement New feature or request

Comments

@bdice
Copy link
Contributor

bdice commented Feb 20, 2025

Description

This issue tracks the development of a RAPIDS cuDF backend for Velox. The project will be implemented in a series of incremental pull requests described below. Our initial goal is to run the TPC-H query plans defined by Velox, thereby covering a broad range of basic data analytics operations end-to-end on GPU.

All of the work described below currently exists in a fork of this repository, which we hope to upstream piece by piece. I plan to update this issue frequently as we make progress on the initial stages of this work.

cuDF team: @karthikeyann @devavret @mhaseeb123 @GregoryKimball

cc: @pedroerp @oerling @Yuhta @assignUser

High level design

  • Build:
    • A new Docker container with CUDA 12.8 is needed for developers (based on the existing Ubuntu 22.04 container). 🐳
    • cuDF is fetched and built if the CMake option VELOX_ENABLE_CUDF is true. The dependency logic will go in CMake/resolve_dependency_modules/cudf.cmake. (cuDF only links to the CUDA runtime, not the CUDA driver library.)
    • I plan to work with @assignUser on CI testing. We will probably want to run GPU builds on CPU nodes, and only run GPU CI tests if the cuDF code has changed.
  • Run:
    • A Velox DriverAdapter is used to replace CPU operators with GPU operators that call cuDF's C++ code. This DriverAdapter can be registered at startup by the application using Velox to enable the cuDF backend.
      • Each operator will have a cuDF equivalent. For example, OrderBy will be replaced by CudfOrderBy.
    • In between CPU operators and GPU operators, another conversion operator is inserted to handle CPU->GPU and GPU->CPU data movement. This allows cuDF operators to be used alongside existing Velox operators.
      • The conversion currently uses Arrow (Velox to Arrow, then Arrow to cuDF). A direct Velox-to-cuDF interop without Arrow may be built in the future for higher performance.
    • Currently, no custom CUDA kernels are needed for this code. All functionality is implemented in pure C++ calling cuDF, which implements the CUDA kernels.
    • There is a lot more to say about tuning for performance (GPU batch sizes, CUDA streams, number of Velox drivers, ...) but I'm leaving that out of this document for the moment.
  • Tests:
    • We have some initial test coverage in place, based on existing Velox tests. We plan to expand the test coverage as we upstream this work.

Next steps

Currently, I expect that each of the lines below will be a separate PR.

  1. Add CUDA Docker Images: build: Add Docker images for CUDA development #12413
  2. Add cuDF to CMake build logic
  3. Incremental feature development in velox/experimental/cudf/. The first few feature PRs will be:
  • cudfDriverAdapter for replacing operators
  • Data conversion (Velox to cuDF, cuDF to Velox -- both via Arrow intermediates) and CudfVector data structure for passing GPU data between operators
  • Operators, probably each in its own PR: HashJoin, OrderBy, aggregations, FilterProject, ...

I'm going to start opening PRs for the CI / build side soon. Ideas and feedback are welcome! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant