Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throwing a nice error if CUDA-aware MPI is not configured #3983

Open
glwagner opened this issue Dec 13, 2024 · 1 comment
Open

Throwing a nice error if CUDA-aware MPI is not configured #3983

glwagner opened this issue Dec 13, 2024 · 1 comment
Labels
GPU 👾 Where Oceananigans gets its powers from user interface/experience 💻

Comments

@glwagner
Copy link
Member

glwagner commented Dec 13, 2024

Revisiting this briefly --- we have found that MPI.has_cuda() only tests Open MPI, so it is not a general solution for determining whether CUDA-aware MPI is available.

However, there are other possible solutions. For example we can write a test like

using MPI
using CUDA
MPI.Init()

function sendrecv_works(grid)
    arch = architecture(grid)
    comm = arch.communicator
    rank = arch.local_rank
    size = MPI.Comm_size(comm)
    dst = mod(rank+1, size)
    src = mod(rank-1, size)
    N = 4
    FT = eltype(grid)
    send_mesg = CuArray{FT}(undef, N)
    recv_mesg = CuArray{FT}(undef, N)
    fill!(send_mesg, FT(rank))
    CUDA.synchronize()
    try
        MPI.Sendrecv!(send_mesg, dst, 0, recv_mesg, src, 0, comm)
        return true
    catch err
        @warn "MPI.Sendrecv test failed." exception=(err, catch_backtrace())
        return false
    end
end

adapted from https://gist.github.com/luraess/0063e90cb08eb2208b7fe204bbd90ed2

Originally posted by @glwagner in #3883 (comment)

@glwagner glwagner added user interface/experience 💻 GPU 👾 Where Oceananigans gets its powers from labels Dec 13, 2024
@simone-silvestri
Copy link
Collaborator

Sounds good. We should also test that the message is received correctly, in that way, this test cannot pass if MPI is configured incorrectly (for example when erroneously size == 1 and dst == src as in the case of @francispoulin #3981 which would make this pass even without CUDA-aware MPI).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GPU 👾 Where Oceananigans gets its powers from user interface/experience 💻
Projects
None yet
Development

No branches or pull requests

2 participants