Improve GPU thread safety #158

neil-lindquist · 2023-12-27T21:31:47Z

When working #157, I ran into a few errors when using a column-major device distribution with the LU routines.

The first issue is when using GPU->GPU copies, we weren't ensuring the correct synchronization. Specifically, we were just assumed that any previous copies into src_device were complete. We had ensured this when the previous copy was GPU->GPU, but not if the previous copy was Host->GPU. fc0a5c fixes this by added some extra queue synchronizations. On Saturn's 16 gtx1060 GPUs w/ single precision, this caused about a 5% performance hit, but is needed for correctness. (The issue and fix only affects multiple GPUs/proc, so it doesn't affect Frontier.) Using CUDA streams et al., we might be able to improve that by preventing the host synchronizations. But, I didn't think I'd have enough time to do that this week with the other things I need to finish up.

The second issue is that tileLayoutConvert(set<ij_tuple> tiles, int device, ...) uses different locking from tileGet, so doing two calls to tileGet(set<ij_tuple> tiles, int device, ...) simultaneously can result in the layout of a tile instance being converted while another tile is trying to copy it. 26ef41 fixes this by replacing the batch transpose with individual, per tile transposes within tileCopyDataLayout. On Frontier, this doesn't seem to affect the performance of gemm or gesv. (Interestingly, it seems to give a few percent speedup on Saturn's 16 gtx1060 GPUs w/ single precision.)

Because of locking, it can lead to race conditions

mgates3 · 2024-01-05T21:41:36Z

I rebased to keep a simpler history and merged it in.

neil-lindquist changed the title ~~Improve gpu thread safety~~ Improve GPU thread safety Dec 27, 2023

mgates3 approved these changes Jan 5, 2024

View reviewed changes

neil-lindquist added 2 commits January 5, 2024 16:37

Add extra synchronization points for GPU->GPU copies

30c4713

Avoid seperate layout conversion

2c609a1

Because of locking, it can lead to race conditions

mgates3 force-pushed the improve-gpu-safety branch from 26ef41f to 2c609a1 Compare January 5, 2024 21:38

mgates3 merged commit 15642f1 into icl-utk-edu:master Jan 5, 2024
8 checks passed

neil-lindquist deleted the improve-gpu-safety branch January 8, 2024 22:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve GPU thread safety #158

Improve GPU thread safety #158

neil-lindquist commented Dec 27, 2023

mgates3 commented Jan 5, 2024

Improve GPU thread safety #158

Improve GPU thread safety #158

Conversation

neil-lindquist commented Dec 27, 2023

mgates3 commented Jan 5, 2024