Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLVMGPUVectorDistribute] Fix batch dimensions extraction for attention-like ops #19040

Merged
merged 1 commit into from
Nov 7, 2024

Conversation

manupak
Copy link
Contributor

@manupak manupak commented Nov 6, 2024

Depends on: #19042

Currently, the batch dimensions are extracted as the union of dimensions present across Q & K & V. This is not correct if one of the dims in inputs (Q,K and V) could be seen as broadcasting.

Therefore, this commit changes this to be:
B = Union ( Q & K & O , K & V & O )
where if parallel dimensions common between both matmuls will be treated as batching dimensions.

@manupak manupak force-pushed the split-k2-online-attention branch 2 times, most recently from a98864a to ce7d7c1 Compare November 6, 2024 16:26
@manupak manupak marked this pull request as draft November 6, 2024 16:27
attention-like ops

Currently, the batch dimensions are extracted as the union
of dimensions present across Q & K & V. This is not correct
if one of the dims in inputs (Q,K and V) could be seen as broadcasting.

Therefore, this commit changes this to be:

B = Union ( Q & K & O , K & V & O ) where if parallel dimensions
common between both matmuls will be treated as batching dimensions.

Signed-off-by: Manupa Karunaratne <[email protected]>
@manupak manupak force-pushed the split-k2-online-attention branch from ce7d7c1 to 2c6d399 Compare November 7, 2024 17:17
@manupak manupak marked this pull request as ready for review November 7, 2024 17:17
@manupak manupak merged commit d90aaae into iree-org:main Nov 7, 2024
36 checks passed
JamesMBartlett pushed a commit to gimletlabs/iree that referenced this pull request Nov 8, 2024
…on-like ops (iree-org#19040)

Currently, the batch dimensions are extracted as the union of dimensions
present across Q & K & V. This is not correct if one of the dims in
inputs (Q,K and V) could be seen as broadcasting.

Therefore, this commit changes this to be:
B = Union ( Q & K & O , K & V & O ) 
where if parallel dimensions common between both matmuls will be treated
as batching dimensions.

Signed-off-by: Manupa Karunaratne <[email protected]>
Groverkss pushed a commit to Groverkss/iree that referenced this pull request Dec 1, 2024
…on-like ops (iree-org#19040)

Currently, the batch dimensions are extracted as the union of dimensions
present across Q & K & V. This is not correct if one of the dims in
inputs (Q,K and V) could be seen as broadcasting.

Therefore, this commit changes this to be:
B = Union ( Q & K & O , K & V & O ) 
where if parallel dimensions common between both matmuls will be treated
as batching dimensions.

Signed-off-by: Manupa Karunaratne <[email protected]>
giacs-epic pushed a commit to giacs-epic/iree that referenced this pull request Dec 4, 2024
…on-like ops (iree-org#19040)

Currently, the batch dimensions are extracted as the union of dimensions
present across Q & K & V. This is not correct if one of the dims in
inputs (Q,K and V) could be seen as broadcasting.

Therefore, this commit changes this to be:
B = Union ( Q & K & O , K & V & O )
where if parallel dimensions common between both matmuls will be treated
as batching dimensions.

Signed-off-by: Manupa Karunaratne <[email protected]>
Signed-off-by: Giacomo Serafini <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants