Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to tune small M shape matmul? #9

Open
leiwen83 opened this issue Jun 4, 2024 · 1 comment
Open

How to tune small M shape matmul? #9

leiwen83 opened this issue Jun 4, 2024 · 1 comment

Comments

@leiwen83
Copy link

leiwen83 commented Jun 4, 2024

Hi~

Very nice post!
I see current the benchmark is targeting at shape M=N=K at various size, so if M is very small, like M=1 N=1792 K=5120, how could it be well handled in this case?

I check the sgemm result is 101 Gflops, but kenrel tune @10 only get 25.7 Gflops...

I think the irregular shape may bring some trouble the moving data around. Not sure whether you could provide some insight here for optimization.

Thx~

@siboehm
Copy link
Owner

siboehm commented Jun 4, 2024

For shapes like that I assume that cuBlas runs a split-k kernel :) It additionally splits on the reduction dimension which gains you extra parallelism but requires either atomics or a second kernel launch to compute the final result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants