-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PTX CUDA API: Segmentation fault, stuck in loop #302
Comments
tl;dr: might be a UB, use older gcc/g++. I ran into the same problem when I used g++-9 as my compiler. By using gdb, I found my program repeatedly looping in the function unsigned CUDARTAPI __cudaPushCallConfiguration(dim3 gridDim, dim3 blockDim,
size_t sharedMem = 0,
struct CUstream_st *stream = 0) {
if (g_debug_execution >= 3) {
announce_call(__my_func__);
}
cudaConfigureCallInternal(gridDim, blockDim, sharedMem, stream);
} And I checked that Furthermore, I want to know whether it's a mistake or an intended trick, so I use ghidra to decompile My version, which turned out to be a nonsense loop: uint __cudaPushCallConfiguration(dim3 gridDim,dim3 blockDim,size_t sharedMem,CUstream_st *stream)
{
size_t *psVar1;
size_t *psVar2;
dim3 blockDim_00;
gpgpu_context *apgStack_50 [3];
undefined auStack_38 [16];
dim3 blockDim_local;
dim3 gridDim_local;
psVar2 = (size_t *)auStack_38;
blockDim_local.x = (uint)blockDim._0_8_;
blockDim_local.y = SUB84(blockDim._0_8_,4);
blockDim_local.z = blockDim.z;
psVar1 = (size_t *)auStack_38;
if (2 < _g_debug_execution) goto LAB_00106308;
do {
*(undefined8 *)((long)psVar2 + -0x10) = 0;
*(undefined8 *)((long)psVar2 + -0x18) = 0x106303;
blockDim_00.z = *(uint *)((long)psVar2 + 0x18);
blockDim_00._0_8_ = *(undefined8 *)((long)psVar2 + 0x10);
cudaConfigureCallInternal
(*(dim3 *)((long)psVar2 + 0x20),blockDim_00,sharedMem,stream,
*(gpgpu_context **)((long)psVar2 + -0x10));
psVar1 = (size_t *)((long)psVar2 + -0x10);
LAB_00106308:
psVar2 = psVar1;
psVar2[1] = (size_t)stream;
*psVar2 = sharedMem;
psVar2[-1] = 0x10631d;
announce_call("unsigned int __cudaPushCallConfiguration(dim3, dim3, size_t, CUstream_st*)");
stream = (CUstream_st *)psVar2[1];
sharedMem = *psVar2;
} while( true );
} And this docker image of gpgpusim 's version, which resembles to uint __cudaPushCallConfiguration(dim3 gridDim,dim3 blockDim,size_t sharedMem,CUstream_st *stream)
{
cudaError_t cVar1;
dim3 blockDim_local;
dim3 gridDim_local;
if (2 < _g_debug_execution) {
announce_call("unsigned int __cudaPushCallConfiguration(dim3, dim3, size_t, CUstream_st*)");
}
cVar1 = cudaConfigureCallInternal(gridDim,blockDim,sharedMem,stream,(gpgpu_context *)0x0);
return cVar1;
} In summary, it seems that there is a missing 'return', which triggers UB and leads to unexpected results. And there seems to be some other UBs in the code, so using an older compiler might help as it reproduces the same compiler behaviours. I hope this helps. |
Has anyone dealed it? My gcc/g++ version is 5.5.
It finally wasted all my const mem and caused core dumped. |
While executing GPGPU-sim, am getting a segmentation fault. Looking at the log file, the PTX cuda api is stuck in a loop.
And executing gemm cutlass implementation also throws segmentation fault.
The text was updated successfully, but these errors were encountered: