-
-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Navi card subgroup shuffle support for gemm #512
base: master
Are you sure you want to change the base?
Changes from 6 commits
fde7943
4e35c89
a871605
b5fe348
3ef0f88
27648f5
ed32af0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -78,6 +78,11 @@ std::shared_ptr<Program> CompileFromSource( | |
} | ||
} | ||
|
||
if (device.IsGPU() && device.IsAMD() && device.Name().find("gfx1") != std::string::npos) { | ||
header_string += "#define USE_SUBGROUP_SHUFFLING 1\n"; | ||
header_string += "#define SUBGROUP_SHUFFLING_GCN 1\n"; | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Implementing it like this forces it on for all these AMD GPUs. Did you verify that using subgroup shuffling is always better compared to not using subgroup shuffling? The easiest way to test is to run the XGEMM kernel tuner with and without this modification and compare execution times of a good portion of the first chunk of kernels. The alternative to what you implemented here is to make it a AMD-specific tuning parameter, and then at tuning time it will decide whether subgroup shuffling was a good idea or not. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It only is supposed to turn on with device name beginning with "gfx1", which I assume only applies on Navi cards, like gfx1010, gfx 1030, etc There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have run xgemm tuner and it seems a little bit better on rx 6900 xt card. I can test on rx 5700 xt too later. |
||
|
||
// For Qualcomm devices, specifying the OpenCL kernel attribute reqd_work_group_size reduces performance. | ||
// This option compiles without the workgroup size requirement and does not affect correctness. | ||
if (device.IsQualcomm()) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the left you write 32, on the right you write 4, one of them probably is incorrect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It should be 32 (for Navi cards). Will change the comment.