Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing tests on AMPERE80 with gcc-13 and cuda-12.6 #73

Closed
DerNils-git opened this issue Jan 20, 2025 · 4 comments · Fixed by #74
Closed

Failing tests on AMPERE80 with gcc-13 and cuda-12.6 #73

DerNils-git opened this issue Jan 20, 2025 · 4 comments · Fixed by #74

Comments

@DerNils-git
Copy link

DerNils-git commented Jan 20, 2025

I tried to update the compiler toolchain used to build HeFFTe (c61c772).
This leads to failing tests with the CUDA backend on an Nvidia A100.
The fftw CPU backend seems to work fine with these compilers.

I use the following toolchain:

  • g++ (GCC) 13.1.0
  • cuda_12.6.r12.6/compiler.34714021_0
  • mpicxx: Open MPI 4.1.7 (Language: C++)
  • OS: SUSE Linux Enterprise Server 15 SP5

Container Image:
https://mpcdf.pages.mpcdf.de/ci-module-image/latest.html
gitlab-registry.mpcdf.mpg.de/mpcdf/ci-module-image/gcc-cuda:latest

The code was compiled using cmake/3.30 with the following preset

  {
      "name": "heffte-gpu-non-cuda-aware",
      "displayName": "GCC GPU HeFFTe",
      "description": "Build HeFFTe using CUDA backend",
      "cacheVariables": {
        "CMAKE_BUILD_TYPE": "Debug",
        "CMAKE_CXX_COMPILER": "g++",
        "CMAKE_C_COMPILER": "gcc",
        "CMAKE_CXX_EXTENSIONS":"Off",
        "Heffte_ENABLE_CUDA": "On",
        "Heffte_DISABLE_GPU_AWARE_MPI": "On"
      }
  }

and the tests fail with the output

$> ctest

Test project /u/nilsch/codes/heffte/build
      Start  1: unit_tests_nompi
 1/25 Test  #1: unit_tests_nompi .................Subprocess aborted***Exception:   0.63 sec

--------------------------------------------------------------------------------
                                 Non-MPI Tests
--------------------------------------------------------------------------------

                                             prime factorize              pass
                                                process grid              pass
                                               split pencils              pass
                                                 cpu scaling              pass
     float                                       gpu::vector              pass
    double                                       gpu::vector              pass
  ccomplex                                       gpu::vector              pass
  zcomplex                                       gpu::vector              pass
Values: 1;3 error magnitude: 2
Values: (1,-11);(0.75188,-8.27068) error magnitude: 2.74058
                                                 gpu scaling              pass
    double                               stock one-dimension              pass
  zcomplex                               stock one-dimension              pass
     float                           stock one-dimension r2c              pass
    double                           stock one-dimension r2c              pass
     float                   stock-cos-type-II one-dimension              pass
     float                   stock-sin-type-II one-dimension              pass
    double                   stock-cos-type-II one-dimension              pass
    double                   stock-sin-type-II one-dimension              pass
Values: (0,0);(3,0) error magnitude: 3
terminate called after throwing an instance of 'std::runtime_error'
  what():    test cufft one-dimension in file: /u/nilsch/codes/heffte/test/test_units_nompi.cpp line: 254

...

      Start  6: heffte_fft3d_np1
 6/25 Test  #6: heffte_fft3d_np1 .................***Failed    2.88 sec
Values: 0.707649;0 error magnitude: 0.707649
terminate called after throwing an instance of 'std::runtime_error'
  what():  mpi rank = 0  test -np 1  test heffte::fft3d in file: /u/nilsch/codes/heffte/test/test_fft3d.h line: 474
[ravg1078:116771] *** Process received signal ***
[ravg1078:116771] Signal: Aborted (6)
[ravg1078:116771] Signal code:  (-6)
[ravg1078:116771] [ 0] /lib64/libpthread.so.0(+0x16910)[0x14db4580b910]
[ravg1078:116771] [ 1] /lib64/libc.so.6(gsignal+0x10d)[0x14db341f3d2b]
[ravg1078:116771] [ 2] /lib64/libc.so.6(abort+0x177)[0x14db341f53e5]
[ravg1078:116771] [ 3] /mpcdf/soft/SLE_15/packages/x86_64/gcc/13.1.0/lib64/libstdc++.so.6(+0xa8377)[0x14db34448377]
[ravg1078:116771] [ 4] /mpcdf/soft/SLE_15/packages/x86_64/gcc/13.1.0/lib64/libstdc++.so.6(+0xb7b3c)[0x14db34457b3c]
[ravg1078:116771] [ 5] /mpcdf/soft/SLE_15/packages/x86_64/gcc/13.1.0/lib64/libstdc++.so.6(+0xb7ba7)[0x14db34457ba7]
[ravg1078:116771] [ 6] /mpcdf/soft/SLE_15/packages/x86_64/gcc/13.1.0/lib64/libstdc++.so.6(+0xb7e07)[0x14db34457e07]
[ravg1078:116771] [ 7] /u/nilsch/codes/heffte/build/test/test_fft3d_np1[0x42940e]
[ravg1078:116771] [ 8] /u/nilsch/codes/heffte/build/test/test_fft3d_np1[0x427271]
[ravg1078:116771] [ 9] /u/nilsch/codes/heffte/build/test/test_fft3d_np1[0x424cc3]
[ravg1078:116771] [10] /u/nilsch/codes/heffte/build/test/test_fft3d_np1[0x424cf2]
[ravg1078:116771] [11] /lib64/libc.so.6(__libc_start_main+0xef)[0x14db341de24d]
[ravg1078:116771] [12] /u/nilsch/codes/heffte/build/test/test_fft3d_np1[0x42417a]
[ravg1078:116771] *** End of error message ***
------------------------------------------------------

...

The following tests FAILED:
          1 - unit_tests_nompi (Subprocess aborted)
          6 - heffte_fft3d_np1 (Failed)
         15 - heffte_fft3d_r2c_np1 (Failed)
         21 - test_cos_np1 (Failed)

I removed some of the output to improve readability.
I am happy for any help. Thank you in advance.

@mkstoyanov
Copy link
Collaborator

As of heFFTe 2.4.1, the GPU aware command should be Heffte_ENABLE_GPU_AWARE_MPI=ON/OFF but the old command is still accepted and this is not the reason for the failed test. In fact, the nompi test runs without any MPI and the np1 tests run on a single node. That is one clue about things that may be going wrong.

The GPU vector test is passing, which means cuda memcopy works fine. However, calls to cuFFT and scale are failing.

  • Are the rest of the tests passing? Is the failure only for nompi and np1 tests?
  • Can you run a simple "hello word"-level of kernel on the system? No MPI, just the kernel. The scaling kernel in heFFTe is literally just an array scale.

template<typename scalar_type, int num_threads, typename index>

@DerNils-git
Copy link
Author

Thank you for the feedback.

Most of the tests are failing.
(I only use a node with four GPUs. Therefore all output with more than 4 MPI will fail naturally.)

Additionally I added a simple printf("Scaling-Kernel\n") into the kernel you suggested above and called cudaDeviceSynchronize(); after all test_gpu_scale() gpu functions.
Since I can not see the corresponding output in the log-file I assume there is a problem with the kernel startup?

I attach a full log-file of the tests run with the above configuration.

heffte_test.log

@mkstoyanov
Copy link
Collaborator

Try adding the following to the CMake options:

"CMAKE_CUDA_ARCHITECTURES":"native",

CMake demands that CMAKE_CUDA_ARCHITECTURE is defined and if the user does not provide a value, then heFFTe will set this to OFF, which in turn should switch off the precompile flags and resort to JIT compilation on-the-fly. However, something isn't working in the latest CUDA/CMake and this is yielding an error the provided PTX was compiled with an unsupported toolchain. I would have to change this in heFFTe, but the CMake flag should work for you right now.

On another note, by default, heFFTe launches all kernels on the default GPU 0. This will not result in a crash, as multiple MPI ranks can use the GPU at the same time. The one GPU being hammered with work may end up running out of memory, but other than that, everything is fine. The benchmark has the -mps flag that will distribute the ranks across the GPUs and on large systems we assume that the users will provide either MPI flags to limit the number of visible GPUs or manually set the desired stream/GPU.

@mkstoyanov
Copy link
Collaborator

This should fix the issue when using sufficiently new CMake.

#74

Please confirm that this is indeed the fix for you. Otherwise we'll keep digging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants