Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for Heterogeneous GPU Configurations in the Cuda Component #312

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

Treece-Burgess
Copy link
Contributor

@Treece-Burgess Treece-Burgess commented Feb 5, 2025

Pull Request Description

This PR adds support for heterogeneous gpu configurations. As a consequence the following were also updated:

  • How we internally handle the default device id
  • How we handle creating a context for users if one is not provided

Tested on Leconte (8 * V100) and Hexane (1 * V100 & 1 * H100):

Test Pass
HelloWorld.cu
HelloWorld_noCuCtx.cu
concurrent_profiling.cu
concurrent_profiling_noCuCtx.cu
cudaOpenMP.cu
cudaOpenMP_noCuCtx.cu
pthreads.cu
pthreads_noCuCtx.cu
simpleMultiGPU.cu
simpleMultiGPU_noCuCtx.cu
test_2thr_1gpu_not_allowed.cu
test_multi_read_and_reset.cu
test_multipass_event_fail.cu

Author Checklist

  • Description
    Why this PR exists. Reference all relevant information, including background, issues, test failures, etc
  • Commits
    Commits are self contained and only do one thing
    Commits have a header of the form: module: short description
    Commits have a body (whenever relevant) containing a detailed description of the addressed problem and its solution
  • Tests
    The PR needs to pass all the tests

@Treece-Burgess Treece-Burgess added component-cuda PRs and Issues related to the cuda component type-feature Issues that request a new feature or PRs that add a new feature status-ready-for-review PR is ready to be reviewed labels Feb 5, 2025
@Treece-Burgess Treece-Burgess force-pushed the 12.20.24-cuda-multi-gpu branch from efd58b5 to b19ea13 Compare February 12, 2025 18:02
@Treece-Burgess Treece-Burgess force-pushed the 12.20.24-cuda-multi-gpu branch from 3d06eba to fe2a676 Compare February 14, 2025 13:56
@dbarry9
Copy link
Contributor

dbarry9 commented Feb 19, 2025

I have tested this PR on two configurations:

  • V100+A100
  • V100+H100

I monitored accurate counts of FP32 and FP64 events on both systems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component-cuda PRs and Issues related to the cuda component status-ready-for-review PR is ready to be reviewed type-feature Issues that request a new feature or PRs that add a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants