Skip to content

QinlongHuang/Customize-CUDA-ops-for-beginner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code Template for Custom CUDA operations

Several simple examples for CUDA C++ (hello_cuda) and PyTorch calling custom CUDA operators (cuda_ops).

For CUDA C++ programs, use CMake or Ninja to build.

# Modern CMake(>= 3.8)
# CMake
cmake -S . -B build
cmake --build build

# Ninja
cmake -S . -Bbuild -G Ninja
cmake --build build

For PyTorch custom CUDA operators, see src/cuda_ops.

You can run nvprof or nsys to profile your CUDA ops, e.g.,

nsys profile src/cuda_ops/add.py

And then open the generated *.nsys-rep file in Nsight Systems.

TODO: For more tutorials, see docs.

Environments

  • OS: Ubuntu 20.04
  • GPU: NVIDIA RTX 4090 w/ Driver 525.147.05
  • CUDA: 11.8
  • Python: 3.8.18
  • PyTorch: 2.1.2+cu118
  • CMake: 3.26.0-rc6
  • Ninja: 1.10.0
  • GCC: 9.4.0

No guarantee to other environments.

Code structure

📦Custom-CUDA-Template
 ┣ 📂docs
 ┃ ┣ 📜CUDA编程模型简介.pdf
 ┃ ┗ 📜自定义CUDA算子用于PyTorch.pdf
 ┣ 📂src
 ┃ ┣ 📂cuda_ops
 ┃ ┃ ┣ 📂include
 ┃ ┃ ┃ ┗ 📜gemm.h
 ┃ ┃ ┣ 📂kernels
 ┃ ┃ ┃ ┣ 📜fused_leaky_relu_kernel.cu
 ┃ ┃ ┃ ┗ 📜gemm_kernel.cu
 ┃ ┃ ┣ 📂pytorch_wrapper
 ┃ ┃ ┃ ┣ 📜fused_leaky_relu_op.cpp
 ┃ ┃ ┃ ┣ 📜gemm_op.cpp
 ┃ ┃ ┃ ┣ 📜gemm_op_jit.cpp
 ┃ ┃ ┃ ┗ 📜gemm_op_st.cpp
 ┃ ┃ ┣ 📜CMakeLists.txt
 ┃ ┃ ┣ 📜__init__.py
 ┃ ┃ ┣ 📜add.py
 ┃ ┃ ┣ 📜fused_leaky_relu.py
 ┃ ┃ ┣ 📜readme.md
 ┃ ┃ ┣ 📜setup.py
 ┃ ┃ ┗ 📜test.py
 ┃ ┗ 📂hello_cuda
 ┃ ┃ ┣ 📂include
 ┃ ┃ ┃ ┗ 📜gemm.h
 ┃ ┃ ┣ 📜CMakeLists.txt
 ┃ ┃ ┣ 📜gemm_cpu.cpp
 ┃ ┃ ┣ 📜gemm_gpu_1thread.cu
 ┃ ┃ ┣ 📜gemm_gpu_mulblock.cu
 ┃ ┃ ┣ 📜gemm_gpu_mulblock_no_restrict.cu
 ┃ ┃ ┣ 📜gemm_gpu_mulblock_no_restrict_reg.cu
 ┃ ┃ ┣ 📜gemm_gpu_mulblock_reg.cu
 ┃ ┃ ┣ 📜gemm_gpu_multhread.cu
 ┃ ┃ ┣ 📜gemm_gpu_tiling.cu
 ┃ ┃ ┗ 📜main.cpp
 ┣ 📜.gitignore
 ┣ 📜.project-root
 ┣ 📜LICENSE
 ┗ 📜README.md

Reference

Recommendation

About

A demo for custom CUDA ops and calling them in PyTorch.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published