Failing tests on AMPERE80 with gcc-13 and cuda-12.6 #73

DerNils-git · 2025-01-20T08:53:18Z

I tried to update the compiler toolchain used to build HeFFTe (c61c772).
This leads to failing tests with the CUDA backend on an Nvidia A100.
The fftw CPU backend seems to work fine with these compilers.

I use the following toolchain:

g++ (GCC) 13.1.0
cuda_12.6.r12.6/compiler.34714021_0
mpicxx: Open MPI 4.1.7 (Language: C++)
OS: SUSE Linux Enterprise Server 15 SP5

Container Image:
https://mpcdf.pages.mpcdf.de/ci-module-image/latest.html
gitlab-registry.mpcdf.mpg.de/mpcdf/ci-module-image/gcc-cuda:latest

The code was compiled using cmake/3.30 with the following preset

  {
      "name": "heffte-gpu-non-cuda-aware",
      "displayName": "GCC GPU HeFFTe",
      "description": "Build HeFFTe using CUDA backend",
      "cacheVariables": {
        "CMAKE_BUILD_TYPE": "Debug",
        "CMAKE_CXX_COMPILER": "g++",
        "CMAKE_C_COMPILER": "gcc",
        "CMAKE_CXX_EXTENSIONS":"Off",
        "Heffte_ENABLE_CUDA": "On",
        "Heffte_DISABLE_GPU_AWARE_MPI": "On"
      }
  }

and the tests fail with the output

$> ctest

Test project /u/nilsch/codes/heffte/build
      Start  1: unit_tests_nompi
 1/25 Test  #1: unit_tests_nompi .................Subprocess aborted***Exception:   0.63 sec

--------------------------------------------------------------------------------
                                 Non-MPI Tests
--------------------------------------------------------------------------------

                                             prime factorize              pass
                                                process grid              pass
                                               split pencils              pass
                                                 cpu scaling              pass
     float                                       gpu::vector              pass
    double                                       gpu::vector              pass
  ccomplex                                       gpu::vector              pass
  zcomplex                                       gpu::vector              pass
Values: 1;3 error magnitude: 2
Values: (1,-11);(0.75188,-8.27068) error magnitude: 2.74058
                                                 gpu scaling              pass
    double                               stock one-dimension              pass
  zcomplex                               stock one-dimension              pass
     float                           stock one-dimension r2c              pass
    double                           stock one-dimension r2c              pass
     float                   stock-cos-type-II one-dimension              pass
     float                   stock-sin-type-II one-dimension              pass
    double                   stock-cos-type-II one-dimension              pass
    double                   stock-sin-type-II one-dimension              pass
Values: (0,0);(3,0) error magnitude: 3
terminate called after throwing an instance of 'std::runtime_error'
  what():    test cufft one-dimension in file: /u/nilsch/codes/heffte/test/test_units_nompi.cpp line: 254

...

      Start  6: heffte_fft3d_np1
 6/25 Test  #6: heffte_fft3d_np1 .................***Failed    2.88 sec
Values: 0.707649;0 error magnitude: 0.707649
terminate called after throwing an instance of 'std::runtime_error'
  what():  mpi rank = 0  test -np 1  test heffte::fft3d in file: /u/nilsch/codes/heffte/test/test_fft3d.h line: 474
[ravg1078:116771] *** Process received signal ***
[ravg1078:116771] Signal: Aborted (6)
[ravg1078:116771] Signal code:  (-6)
[ravg1078:116771] [ 0] /lib64/libpthread.so.0(+0x16910)[0x14db4580b910]
[ravg1078:116771] [ 1] /lib64/libc.so.6(gsignal+0x10d)[0x14db341f3d2b]
[ravg1078:116771] [ 2] /lib64/libc.so.6(abort+0x177)[0x14db341f53e5]
[ravg1078:116771] [ 3] /mpcdf/soft/SLE_15/packages/x86_64/gcc/13.1.0/lib64/libstdc++.so.6(+0xa8377)[0x14db34448377]
[ravg1078:116771] [ 4] /mpcdf/soft/SLE_15/packages/x86_64/gcc/13.1.0/lib64/libstdc++.so.6(+0xb7b3c)[0x14db34457b3c]
[ravg1078:116771] [ 5] /mpcdf/soft/SLE_15/packages/x86_64/gcc/13.1.0/lib64/libstdc++.so.6(+0xb7ba7)[0x14db34457ba7]
[ravg1078:116771] [ 6] /mpcdf/soft/SLE_15/packages/x86_64/gcc/13.1.0/lib64/libstdc++.so.6(+0xb7e07)[0x14db34457e07]
[ravg1078:116771] [ 7] /u/nilsch/codes/heffte/build/test/test_fft3d_np1[0x42940e]
[ravg1078:116771] [ 8] /u/nilsch/codes/heffte/build/test/test_fft3d_np1[0x427271]
[ravg1078:116771] [ 9] /u/nilsch/codes/heffte/build/test/test_fft3d_np1[0x424cc3]
[ravg1078:116771] [10] /u/nilsch/codes/heffte/build/test/test_fft3d_np1[0x424cf2]
[ravg1078:116771] [11] /lib64/libc.so.6(__libc_start_main+0xef)[0x14db341de24d]
[ravg1078:116771] [12] /u/nilsch/codes/heffte/build/test/test_fft3d_np1[0x42417a]
[ravg1078:116771] *** End of error message ***
------------------------------------------------------

...

The following tests FAILED:
          1 - unit_tests_nompi (Subprocess aborted)
          6 - heffte_fft3d_np1 (Failed)
         15 - heffte_fft3d_r2c_np1 (Failed)
         21 - test_cos_np1 (Failed)

I removed some of the output to improve readability.
I am happy for any help. Thank you in advance.

The text was updated successfully, but these errors were encountered:

mkstoyanov · 2025-02-07T16:33:58Z

As of heFFTe 2.4.1, the GPU aware command should be Heffte_ENABLE_GPU_AWARE_MPI=ON/OFF but the old command is still accepted and this is not the reason for the failed test. In fact, the nompi test runs without any MPI and the np1 tests run on a single node. That is one clue about things that may be going wrong.

The GPU vector test is passing, which means cuda memcopy works fine. However, calls to cuFFT and scale are failing.

Are the rest of the tests passing? Is the failure only for nompi and np1 tests?
Can you run a simple "hello word"-level of kernel on the system? No MPI, just the kernel. The scaling kernel in heFFTe is literally just an array scale.

heffte/src/heffte_backend_cuda.cu

Line 138 in c61c772

template<typename scalar_type, int num_threads, typename index>

DerNils-git · 2025-02-11T17:49:13Z

Thank you for the feedback.

Most of the tests are failing.
(I only use a node with four GPUs. Therefore all output with more than 4 MPI will fail naturally.)

Additionally I added a simple printf("Scaling-Kernel\n") into the kernel you suggested above and called cudaDeviceSynchronize(); after all test_gpu_scale() gpu functions.
Since I can not see the corresponding output in the log-file I assume there is a problem with the kernel startup?

I attach a full log-file of the tests run with the above configuration.

heffte_test.log

mkstoyanov · 2025-02-11T19:08:22Z

Try adding the following to the CMake options:

"CMAKE_CUDA_ARCHITECTURES":"native",

CMake demands that CMAKE_CUDA_ARCHITECTURE is defined and if the user does not provide a value, then heFFTe will set this to OFF, which in turn should switch off the precompile flags and resort to JIT compilation on-the-fly. However, something isn't working in the latest CUDA/CMake and this is yielding an error the provided PTX was compiled with an unsupported toolchain. I would have to change this in heFFTe, but the CMake flag should work for you right now.

On another note, by default, heFFTe launches all kernels on the default GPU 0. This will not result in a crash, as multiple MPI ranks can use the GPU at the same time. The one GPU being hammered with work may end up running out of memory, but other than that, everything is fine. The benchmark has the -mps flag that will distribute the ranks across the GPUs and on large systems we assume that the users will provide either MPI flags to limit the number of visible GPUs or manually set the desired stream/GPU.

mkstoyanov · 2025-02-11T20:11:21Z

This should fix the issue when using sufficiently new CMake.

#74

Please confirm that this is indeed the fix for you. Otherwise we'll keep digging.

mkstoyanov mentioned this issue Feb 11, 2025

Switch to cuda native #74

Merged

mkstoyanov closed this as completed in #74 Feb 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing tests on AMPERE80 with gcc-13 and cuda-12.6 #73

Failing tests on AMPERE80 with gcc-13 and cuda-12.6 #73

DerNils-git commented Jan 20, 2025 •

edited

Loading

mkstoyanov commented Feb 7, 2025

DerNils-git commented Feb 11, 2025

mkstoyanov commented Feb 11, 2025

mkstoyanov commented Feb 11, 2025

Failing tests on AMPERE80 with gcc-13 and cuda-12.6 #73

Failing tests on AMPERE80 with gcc-13 and cuda-12.6 #73

Comments

DerNils-git commented Jan 20, 2025 • edited Loading

mkstoyanov commented Feb 7, 2025

DerNils-git commented Feb 11, 2025

mkstoyanov commented Feb 11, 2025

mkstoyanov commented Feb 11, 2025

DerNils-git commented Jan 20, 2025 •

edited

Loading