Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve GPU thread safety #158

Merged
merged 2 commits into from
Jan 5, 2024

Conversation

neil-lindquist
Copy link
Contributor

When working #157, I ran into a few errors when using a column-major device distribution with the LU routines.

The first issue is when using GPU->GPU copies, we weren't ensuring the correct synchronization. Specifically, we were just assumed that any previous copies into src_device were complete. We had ensured this when the previous copy was GPU->GPU, but not if the previous copy was Host->GPU. fc0a5c fixes this by added some extra queue synchronizations. On Saturn's 16 gtx1060 GPUs w/ single precision, this caused about a 5% performance hit, but is needed for correctness. (The issue and fix only affects multiple GPUs/proc, so it doesn't affect Frontier.) Using CUDA streams et al., we might be able to improve that by preventing the host synchronizations. But, I didn't think I'd have enough time to do that this week with the other things I need to finish up.

The second issue is that tileLayoutConvert(set<ij_tuple> tiles, int device, ...) uses different locking from tileGet, so doing two calls to tileGet(set<ij_tuple> tiles, int device, ...) simultaneously can result in the layout of a tile instance being converted while another tile is trying to copy it. 26ef41 fixes this by replacing the batch transpose with individual, per tile transposes within tileCopyDataLayout. On Frontier, this doesn't seem to affect the performance of gemm or gesv. (Interestingly, it seems to give a few percent speedup on Saturn's 16 gtx1060 GPUs w/ single precision.)

@neil-lindquist neil-lindquist changed the title Improve gpu thread safety Improve GPU thread safety Dec 27, 2023
@mgates3 mgates3 force-pushed the improve-gpu-safety branch from 26ef41f to 2c609a1 Compare January 5, 2024 21:38
@mgates3 mgates3 merged commit 15642f1 into icl-utk-edu:master Jan 5, 2024
8 checks passed
@mgates3
Copy link
Collaborator

mgates3 commented Jan 5, 2024

I rebased to keep a simpler history and merged it in.

@neil-lindquist neil-lindquist deleted the improve-gpu-safety branch January 8, 2024 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants