-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPU] Move tile and distribute pass before packing to intrinsic for TileAndfuse pipeline #19053
[GPU] Move tile and distribute pass before packing to intrinsic for TileAndfuse pipeline #19053
Conversation
compiler/src/iree/compiler/Codegen/Common/PropagateReshapesByExpansion.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/PropagateReshapesByExpansion.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/PropagateReshapesByExpansion.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/PropagateReshapesByExpansion.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/PropagateReshapesByExpansion.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/PropagateReshapesByExpansion.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/PropagateReshapesByExpansion.cpp
Outdated
Show resolved
Hide resolved
This seems like a hack... Can you give an example where it is required to TileAndDistribute first before packing to intrinsics to make it work for unaligned shapes? |
I think how it is right now, is actually the hack done by this PR #18565 The basic problem I was facing after that PR is for unaligned shapes tile and distribute introduces the extract slice but if we do the operand promotion which happens before pack to intrinsic then that logic doesnt quite work. @qedawkins can you please elaborate on why we broadly want the sequence that this PR is doing? |
I think that exactly what we should be doing. After you distribute you promote the operands and read into a padded shared memory buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good. I thought we may have to rethink the pack/unpack decomposition passes, but if tests are passing then it is probably fine.
Main comment is that the IGEMM pass should still run before workgroup distribution. I'm also curious why one of the TileAndFuse pipeline tests has changed.
compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/pipeline_tile_and_fuse.mlir
Outdated
Show resolved
Hide resolved
…ileAndfuse pipeline Signed-off-by: Nirvedh <[email protected]> Signed-off-by: Nirvedh Meshram <[email protected]>
6c83f21
to
0933cc1
Compare
@Max191 all tests are passing cleanly now due to improving the hoisting pattern and correcting the distribution to happen after IGEMM conversion :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, looks good!
…TileAndfuse pipeline (iree-org#19053) We want to first distribute to workgroups so that we can promote operands to handle unaligned to intrinsic cases before we concretize the mma shapes. Signed-off-by: Nirvedh Meshram <[email protected]>
…TileAndfuse pipeline (iree-org#19053) We want to first distribute to workgroups so that we can promote operands to handle unaligned to intrinsic cases before we concretize the mma shapes. Signed-off-by: Nirvedh Meshram <[email protected]> Signed-off-by: Giacomo Serafini <[email protected]>
We want to first distribute to workgroups so that we can promote operands to handle unaligned to intrinsic cases before we concretize the mma shapes.