[LLVMGPUVectorDistribute] Fix batch dimensions extraction for attention-like ops #19040

manupak · 2024-11-06T11:09:06Z

Depends on: #19042

Currently, the batch dimensions are extracted as the union of dimensions present across Q & K & V. This is not correct if one of the dims in inputs (Q,K and V) could be seen as broadcasting.

Therefore, this commit changes this to be:
B = Union ( Q & K & O , K & V & O )
where if parallel dimensions common between both matmuls will be treated as batching dimensions.

attention-like ops Currently, the batch dimensions are extracted as the union of dimensions present across Q & K & V. This is not correct if one of the dims in inputs (Q,K and V) could be seen as broadcasting. Therefore, this commit changes this to be: B = Union ( Q & K & O , K & V & O ) where if parallel dimensions common between both matmuls will be treated as batching dimensions. Signed-off-by: Manupa Karunaratne <[email protected]>

…on-like ops (iree-org#19040) Currently, the batch dimensions are extracted as the union of dimensions present across Q & K & V. This is not correct if one of the dims in inputs (Q,K and V) could be seen as broadcasting. Therefore, this commit changes this to be: B = Union ( Q & K & O , K & V & O ) where if parallel dimensions common between both matmuls will be treated as batching dimensions. Signed-off-by: Manupa Karunaratne <[email protected]>

…on-like ops (iree-org#19040) Currently, the batch dimensions are extracted as the union of dimensions present across Q & K & V. This is not correct if one of the dims in inputs (Q,K and V) could be seen as broadcasting. Therefore, this commit changes this to be: B = Union ( Q & K & O , K & V & O ) where if parallel dimensions common between both matmuls will be treated as batching dimensions. Signed-off-by: Manupa Karunaratne <[email protected]> Signed-off-by: Giacomo Serafini <[email protected]>

manupak requested review from hanhanW, MaheshRavishankar, qedawkins, kuhar and Groverkss as code owners November 6, 2024 11:09

manupak force-pushed the split-k2-online-attention branch 2 times, most recently from a98864a to ce7d7c1 Compare November 6, 2024 16:26

manupak marked this pull request as draft November 6, 2024 16:27

Groverkss approved these changes Nov 7, 2024

View reviewed changes

manupak force-pushed the split-k2-online-attention branch from ce7d7c1 to 2c6d399 Compare November 7, 2024 17:17

manupak marked this pull request as ready for review November 7, 2024 17:17

manupak merged commit d90aaae into iree-org:main Nov 7, 2024
36 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLVMGPUVectorDistribute] Fix batch dimensions extraction for attention-like ops #19040

[LLVMGPUVectorDistribute] Fix batch dimensions extraction for attention-like ops #19040

manupak commented Nov 6, 2024 •

edited

Loading

[LLVMGPUVectorDistribute] Fix batch dimensions extraction for attention-like ops #19040

[LLVMGPUVectorDistribute] Fix batch dimensions extraction for attention-like ops #19040

Conversation

manupak commented Nov 6, 2024 • edited Loading

manupak commented Nov 6, 2024 •

edited

Loading