You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue tracks the development of a RAPIDS cuDF backend for Velox. The project will be implemented in a series of incremental pull requests described below. Our initial goal is to run the TPC-H query plans defined by Velox, thereby covering a broad range of basic data analytics operations end-to-end on GPU.
All of the work described below currently exists in a fork of this repository, which we hope to upstream piece by piece. I plan to update this issue frequently as we make progress on the initial stages of this work.
A new Docker container with CUDA 12.8 is needed for developers (based on the existing Ubuntu 22.04 container). 🐳
cuDF is fetched and built if the CMake option VELOX_ENABLE_CUDF is true. The dependency logic will go in CMake/resolve_dependency_modules/cudf.cmake. (cuDF only links to the CUDA runtime, not the CUDA driver library.)
I plan to work with @assignUser on CI testing. We will probably want to run GPU builds on CPU nodes, and only run GPU CI tests if the cuDF code has changed.
Run:
A Velox DriverAdapter is used to replace CPU operators with GPU operators that call cuDF's C++ code. This DriverAdapter can be registered at startup by the application using Velox to enable the cuDF backend.
Each operator will have a cuDF equivalent. For example, OrderBy will be replaced by CudfOrderBy.
In between CPU operators and GPU operators, another conversion operator is inserted to handle CPU->GPU and GPU->CPU data movement. This allows cuDF operators to be used alongside existing Velox operators.
The conversion currently uses Arrow (Velox to Arrow, then Arrow to cuDF). A direct Velox-to-cuDF interop without Arrow may be built in the future for higher performance.
Currently, no custom CUDA kernels are needed for this code. All functionality is implemented in pure C++ calling cuDF, which implements the CUDA kernels.
There is a lot more to say about tuning for performance (GPU batch sizes, CUDA streams, number of Velox drivers, ...) but I'm leaving that out of this document for the moment.
Tests:
We have some initial test coverage in place, based on existing Velox tests. We plan to expand the test coverage as we upstream this work.
Next steps
Currently, I expect that each of the lines below will be a separate PR.
Description
This issue tracks the development of a RAPIDS cuDF backend for Velox. The project will be implemented in a series of incremental pull requests described below. Our initial goal is to run the TPC-H query plans defined by Velox, thereby covering a broad range of basic data analytics operations end-to-end on GPU.
All of the work described below currently exists in a fork of this repository, which we hope to upstream piece by piece. I plan to update this issue frequently as we make progress on the initial stages of this work.
cuDF team: @karthikeyann @devavret @mhaseeb123 @GregoryKimball
cc: @pedroerp @oerling @Yuhta @assignUser
High level design
VELOX_ENABLE_CUDF
is true. The dependency logic will go inCMake/resolve_dependency_modules/cudf.cmake
. (cuDF only links to the CUDA runtime, not the CUDA driver library.)DriverAdapter
is used to replace CPU operators with GPU operators that call cuDF's C++ code. ThisDriverAdapter
can be registered at startup by the application using Velox to enable the cuDF backend.OrderBy
will be replaced byCudfOrderBy
.Next steps
Currently, I expect that each of the lines below will be a separate PR.
velox/experimental/cudf/
. The first few feature PRs will be:cudfDriverAdapter
for replacing operatorsI'm going to start opening PRs for the CI / build side soon. Ideas and feedback are welcome! 🚀
The text was updated successfully, but these errors were encountered: