[GPU] Use affine.linearize_index (and delinearize_index) where possible #19087

krzysz00 · 2024-11-08T20:56:19Z

There have been issues with the composition of affine maps being too general and loosing important information, like the fact that affine_map<(s0 + s1 * 32 + ... - (s0 floorDiv 16) * 16)> really should be affine_map<(s0 mod 16 + s1 * 32 + ...)>, and other issues with the ultimate IR that block low-level arithmetic optimizations.

The affine.delinearize_index operation represents the div/mod chains needed to break a flat index into its component parts. A recently added affine.linearize_index operation is its inverse - combining multiple indices into a flat 1D value.

Another advantage to linearize/delinearize is simpler upstream canonicalizations and lead to more streamlined generated code.

This PR updates the vector distribution code and other GPU-related code that I could find to

Use affine.linearize_index to construct flat thread IDs
Use affine.delinearize_index in places where there was a floorDiv/mod chain.

Additionally

Change the scf.for with a thread ID as initial input approach to non-uniform execution for an scf.if to better reflect the intended control flow and enable a linearize
Plumb the subgroup size through the transfer_read and transfer_write distribution patterns to enable better reasoning about when you do/don't need to take a mod of the lane ID

There have been issues with the composition of affine maps being too general and loosing important information, like the fact that affine_map<(s0 + s1 * 32 + ... - (s0 floorDiv 16) * 16)> realy should be affine_map<(s0 mod 16 + s1 * 32 + ...)>, and other issues with the ultimate IR that block low-level arithmetic optimizations. The affine.delinearize_index operation represents the div/mod chains needed to break a flat index into its component parts. A recently added affine.linearize_index operation is its inverse - combining multiple indices into a flat 1D value. Another advantage to linearize/delinearize is simpler upstream canonicalizations and lead to more streamlined generated code. This PR updates the vector distribution code and other GPU-related code that I could find to 1. Use affine.linearize_index to construct flat thread IDs 2. Use affine.delinearize_index in places where there was a floorDiv/mod chain.

MaheshRavishankar · 2024-11-12T21:09:04Z

compiler/src/iree/compiler/Codegen/Common/GPU/GPUDistributeForall.cpp

@@ -143,6 +153,22 @@ LogicalResult resolveGPUMappedForallOp(RewriterBase &rewriter,
                             newBlockArgs);
  rewriter.eraseOp(forallTerminator);
  rewriter.eraseOp(forallOp);
+
+  // Step 5. Create the post-loop code that only executes on some workitems.
+  if (hasPostLoopTail) {


Could we move this into a separate PR?

krzysz00 · 2024-11-12T22:48:56Z

Closing in favor of #19122 so I can split the PR

krzysz00 force-pushed the gpu-distribute-with-linearize branch 2 times, most recently from 819a5ae to c67c36d Compare November 12, 2024 17:03

krzysz00 added 5 commits November 12, 2024 19:20

Update text test, going to need to a pattern for exact cancellations

bb8baad

Add subgroup size to distribute patterns, keep hacking on tests

8e0d38e

Next test

9810347

Final test cleanup

8192c87

krzysz00 force-pushed the gpu-distribute-with-linearize branch from c67c36d to 8192c87 Compare November 12, 2024 19:22

krzysz00 marked this pull request as ready for review November 12, 2024 19:48

krzysz00 requested review from MaheshRavishankar, qedawkins, kuhar, Groverkss, antiagainst and hanhanW as code owners November 12, 2024 19:48

MaheshRavishankar reviewed Nov 12, 2024

View reviewed changes

Test fix for rebase

3961ced

krzysz00 closed this Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPU] Use affine.linearize_index (and delinearize_index) where possible #19087

[GPU] Use affine.linearize_index (and delinearize_index) where possible #19087

krzysz00 commented Nov 8, 2024 •

edited

Loading

MaheshRavishankar Nov 12, 2024

krzysz00 commented Nov 12, 2024

[GPU] Use affine.linearize_index (and delinearize_index) where possible #19087

[GPU] Use affine.linearize_index (and delinearize_index) where possible #19087

Conversation

krzysz00 commented Nov 8, 2024 • edited Loading

MaheshRavishankar Nov 12, 2024

Choose a reason for hiding this comment

krzysz00 commented Nov 12, 2024

krzysz00 commented Nov 8, 2024 •

edited

Loading