Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU] Move tile and distribute pass before packing to intrinsic for TileAndfuse pipeline #19053

Conversation

nirvedhmeshram
Copy link
Contributor

@nirvedhmeshram nirvedhmeshram commented Nov 6, 2024

We want to first distribute to workgroups so that we can promote operands to handle unaligned to intrinsic cases before we concretize the mma shapes.

@nirvedhmeshram
Copy link
Contributor Author

@kuhar your comments are on the wrong commit but I will address them on #19044

@Groverkss
Copy link
Contributor

This seems like a hack... Can you give an example where it is required to TileAndDistribute first before packing to intrinsics to make it work for unaligned shapes?

@nirvedhmeshram
Copy link
Contributor Author

This seems like a hack... Can you give an example where it is required to TileAndDistribute first before packing to intrinsics to make it work for unaligned shapes?

I think how it is right now, is actually the hack done by this PR #18565

The basic problem I was facing after that PR is for unaligned shapes tile and distribute introduces the extract slice but if we do the operand promotion which happens before pack to intrinsic then that logic doesnt quite work. @qedawkins can you please elaborate on why we broadly want the sequence that this PR is doing?

@MaheshRavishankar
Copy link
Contributor

This seems like a hack... Can you give an example where it is required to TileAndDistribute first before packing to intrinsics to make it work for unaligned shapes?

I think that exactly what we should be doing. After you distribute you promote the operands and read into a padded shared memory buffer

Copy link
Contributor

@Max191 Max191 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good. I thought we may have to rethink the pack/unpack decomposition passes, but if tests are passing then it is probably fine.

Main comment is that the IGEMM pass should still run before workgroup distribution. I'm also curious why one of the TileAndFuse pipeline tests has changed.

…ileAndfuse pipeline

Signed-off-by: Nirvedh <[email protected]>
Signed-off-by: Nirvedh Meshram <[email protected]>
@nirvedhmeshram nirvedhmeshram force-pushed the tile_and_fuse_do_workgroup_distribution_first branch from 6c83f21 to 0933cc1 Compare November 13, 2024 21:05
@nirvedhmeshram
Copy link
Contributor Author

@Max191 all tests are passing cleanly now due to improving the hoisting pattern and correcting the distribution to happen after IGEMM conversion :)

Copy link
Contributor

@Max191 Max191 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, looks good!

@nirvedhmeshram nirvedhmeshram merged commit f828914 into iree-org:main Nov 14, 2024
36 checks passed
Groverkss pushed a commit to Groverkss/iree that referenced this pull request Dec 1, 2024
…TileAndfuse pipeline (iree-org#19053)

We want to first distribute to workgroups so that we can promote
operands to handle unaligned to intrinsic cases before we concretize the
mma shapes.

Signed-off-by: Nirvedh Meshram <[email protected]>
giacs-epic pushed a commit to giacs-epic/iree that referenced this pull request Dec 4, 2024
…TileAndfuse pipeline (iree-org#19053)

We want to first distribute to workgroups so that we can promote
operands to handle unaligned to intrinsic cases before we concretize the
mma shapes.

Signed-off-by: Nirvedh Meshram <[email protected]>
Signed-off-by: Giacomo Serafini <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants