[LLVMGPU] Refactor VirtualMMAIntrinsic to it's own attribute and enum. #19055

raikonenfnu · 2024-11-07T01:34:36Z

VirtualMMA are variants of MMAOps that has layouts modified through interleaved of reads and/or unrolled-K. These are especially useful for coalescing reads from shared memory to register and for FA to align layouts between the 1st and 2nd matmul.

However, the current implementation of virtual MMA lives in MMAAttr and MMAIntrinsic which is a violation of abstractions. Especially since data_tiled_mma_layout is dependent on it. Additionally, having VirtualMMA in MMAAttr also limits the flexibility of adding more stuff and specification to it in the future.

In order to resolve this conflict of abstractions, we are migrating VirtualMMAAttr to it's own enums and attr/class.

Things that we have changed in this PR:

Factor our VMFMA from MMAAttr and MMAIntrinsic into it's own attr and enums VirtualMMAAttr and VirtualMMAIntrinsic.
Update places that may use MMAAttr or VirtualMMAAttr to simply use MmaInterfaceAttr such as KernelConfig, LLVMGPUConfigureTensorLayout, and AMDGPUDistributeContract, GPUNestedDistribution
Move get[A,B,C]SingleSubgroupLayout as an global method S.T we can cast as MmaInterfaceAttr in places that may use MMAAttr and VirtualMMAAttr.

Signed-off-by: Stanley Winata <[email protected]>

bjacob

Thanks for taking this on! +1 on the general approach. Question below about changing normal functions to InterfaceMethod's.

bjacob · 2024-11-07T14:44:52Z

compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUInterfaces.td

@@ -60,6 +60,48 @@ def IREEGPU_MmaInterfaceAttr : AttrInterface<"MmaInterfaceAttr"> {
      /*methodName=*/"getABCVectorTypes",
      /*args=*/(ins)
    >,
+    InterfaceMethod<


For every InterfaceMethod added in any new PR, I ask: why can't this simply be a global function declared in a .h instead? Two concerns here:

The C++ debate between member and non-member functions. It's been an evolution in the whole C++ culture (some googling found this post which summarizes a few links) and in the C++ standard library itself (e.g. vector.begin() vs std::begin(vector))

In this instance it's also about going through generated code vs not. With InterfaceMethod, the indirection through generated code makes it harder to follow the code, both for the human reader, and even for tools like clangd which can't tell how many stack frames to walk to when you click "go to implementation".

In this instance, these were existing global functions, so why change that now?

In this instance, these were existing global functions, so why change that now?

Great question! Firstly, I think these method has always been a member method of MmaAttr, and previously we have always just casted attributes into MmaAttr and then calling these get[A,B,C]SingleSubgroupLayout

e.g:

iree/compiler/src/iree/compiler/Codegen/LLVMGPU/LLVMGPUConfigureTensorLayouts.cpp

Lines 311 to 318 in 2b1a8e7

auto qkIntrinsic = cast<IREE::GPU::MMAAttr>(qkSchedule.getIntrinsic());

auto pvIntrinsic = cast<IREE::GPU::MMAAttr>(pvSchedule.getIntrinsic());

IREE::GPU::MMASingleSubgroupLayout lhsLayout =

pvIntrinsic.getASingleSubgroupLayout();

IREE::GPU::MMASingleSubgroupLayout rhsLayout =

pvIntrinsic.getBSingleSubgroupLayout();

IREE::GPU::MMASingleSubgroupLayout outLayout =

qkIntrinsic.getCSingleSubgroupLayout();

iree/compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUAttrs.cpp

Line 1316 in 2b1a8e7

auto mmaAttr = llvm::cast<MMAAttr>(getIntrinsic());

iree/compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUAttrs.cpp

Line 1491 in 2b1a8e7

mmaAttr.getASingleSubgroupLayout());

Now that it can also be VirtualMmaAttr and since VirtualMmaAttr and MmaAttr does not share a base class, the path of least motion is to move these member methods of MmaAttr into MmaInterfaceAttr.

That being said, I do agree it is probably safer and cleaner to move these methods as global method, so let me push that up now. :)

I think it's done, LMK if this is what you have in mind or if you think we can clean it up some more!

good point! I didn't remember it was preexisting, but it is. I still think we should, and it's worth the effort, to change existing things that are InterfaceMethod to be plain C++ instead.

kuhar

Looks good overall, just some nits

kuhar · 2024-11-07T15:34:43Z

compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUAttrs.cpp

+  }
+  }
+  // This should not happen but just to make GCC happy.
+  return std::make_tuple(VectorType{}, VectorType{}, VectorType{});


Suggested change

return std::make_tuple(VectorType{}, VectorType{}, VectorType{});

return {};

Also add llvm_ureachable if it's not supposed to happen?

@kuhar , llvm_unreachable is a powerful optimization tool, by allowing the compiler to assume a place to be unreachable, but it's also dangerous if it's ever reached, so it only makes sense to use it in very, very performance-critical code. In all other cases, if you want to catch something that's not supposed to happen, maybe you want an assertion (assert(false)); all the usual objections against assertions ("don't risk crashing the compiler on bad user input") also apply even more strongly against llvm_unreachable.

Also note that here, GCC is saying something very important and Clang's behavior is borderline buggy: these TableGen-generated enums unfortunately have a specified underlying type:

enum class MMAIntrinsic : uint32_t {

Unfortunately, the : uint32_t makes a big difference here: it makes it legal to form values that aren't any of the enumerator values. So when Clang says "you dont need a default: case in this switch because it would be UB anyway if the enum value were none of the enumerators", that's only correct for enums without a specified underlying type.

Here, unless you want to audit every caller, we have to treat it as a possibility that the enum value is none of the enumerators and so that "error" case is in fact reachable.

Note: https://eel.is/c++draft/expr.static.cast#9 :

If the enumeration type has a fixed underlying type, the value is first converted to that type by integral promotion ([conv.prom]) or integral conversion ([conv.integral]), if necessary, and then to the enumeration type.
If the enumeration type does not have a fixed underlying type, the value is unchanged if the original value is within the range of the enumeration values ([dcl.enum]), and otherwise, the behavior is undefined.

The way I understand it, llvm_unreachable translates to:

Debug ==> assert(false)

Release ==> assume(false)

While plain assert disappears in Release. So to me llvm_unreachable is fine as a tool to stop the compiler from complaining about missing return values etc., when there's something else that structurally checks that all the cases were handled. And I think this will happen by warnings about unhandled switch cases.

But I don't care about it much in either direction

I replied to the first comment before I saw your second one; assert makes sense to me then. I didn't realize it was an open set of possible values.

While plain assert disappears in Release.

Yes. Apparently the idea is that the same control paths are exercised in Debug and Release builds. That assumption seems untenable in practice. Users use Release bugs, and they will run into corner cases that weren't hit by tests and local developer invocations on debug builds.

good catch! done!

EDIT: Meant this as review comment, please ignore for now.

Filed llvm/llvm-project#115345

compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUAttrs.cpp

compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUAttrs.td

compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUInterfaces.h

Signed-off-by: Stanley Winata <[email protected]>

kuhar · 2024-11-07T17:47:04Z

compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUAttrs.cpp

+MMASingleSubgroupLayout getASingleSubgroupLayout(MmaInterfaceAttr mmaKind) {
+  if (auto mmaAttr = llvm::dyn_cast<MMAAttr>(mmaKind)) {
+    return mmaAttr.getASingleSubgroupLayout();
+  } else if (auto vmmaAttr = llvm::dyn_cast<VirtualMMAAttr>(mmaKind)) {


nit: no else after return: https://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return

haven't seen this doc, very cool! will do this, thanks!

compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUAttrs.cpp

kuhar · 2024-11-07T17:47:48Z

compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUAttrs.cpp

+//===----------------------------------------------------------------------===//
+
+MMASingleSubgroupLayout getASingleSubgroupLayout(MmaInterfaceAttr mmaKind) {
+  if (auto mmaAttr = llvm::dyn_cast<MMAAttr>(mmaKind)) {


nit: you shouldn't need llvm:: in front of casts? Also everywhere else

oh, I have always thought llvm has a special dyn_cast, is llvm::dyn_cast and dyn_cast the same?

if it is inded the same, is there a reason why most of the code in IREE is llvm::dyn_cast vs regular dyn_cast? do we have non uniform name space?

I think these come from some automatic refactoring script that we ran when migrating from X.dyn_cast<T>() to dyn_cast<T>(X)

but there should be no reason to add llvm:: in handwritten code

oh, I have always thought llvm has a special dyn_cast, is llvm::dyn_cast and dyn_cast the same?

What else than llvm::dyn_cast could dyn_cast be? (What other namespace has a dyn_cast identifier?)

done! although, if you have answers to these Qs would still love to hear it! :)

EDIT: sorry, your previous responses didn't refresh/show up on my screen

What else than llvm::dyn_cast could dyn_cast be? (What other namespace has a dyn_cast identifier?)

Ah, my bad! I was thinking of std C++'s dynamic_cast

compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUAttrs.cpp

Signed-off-by: Stanley Winata <[email protected]>

iree-org#19055) VirtualMMA are variants of MMAOps that has layouts modified through interleaved of reads and/or unrolled-K. These are especially useful for coalescing reads from shared memory to register and for FA to align layouts between the 1st and 2nd matmul. However, the current implementation of virtual MMA lives in MMAAttr and MMAIntrinsic which is a violation of abstractions. Especially since data_tiled_mma_layout is dependent on it. Additionally, having VirtualMMA in MMAAttr also limits the flexibility of adding more stuff and specification to it in the future. In order to resolve this conflict of abstractions, we are migrating VirtualMMAAttr to it's own enums and attr/class. Things that we have changed in this PR: 1. Factor our VMFMA from `MMAAttr` and `MMAIntrinsic` into it's own attr and enums `VirtualMMAAttr ` and `VirtualMMAIntrinsic`. 2. Update places that may use `MMAAttr` or `VirtualMMAAttr` to simply use `MmaInterfaceAttr` such as `KernelConfig`, `LLVMGPUConfigureTensorLayout`, and `AMDGPUDistributeContract`, `GPUNestedDistribution` 3. Move `get[A,B,C]SingleSubgroupLayout` as global method that works on `MmaInterfaceAttr`. --------- Signed-off-by: Stanley Winata <[email protected]>

iree-org#19055) VirtualMMA are variants of MMAOps that has layouts modified through interleaved of reads and/or unrolled-K. These are especially useful for coalescing reads from shared memory to register and for FA to align layouts between the 1st and 2nd matmul. However, the current implementation of virtual MMA lives in MMAAttr and MMAIntrinsic which is a violation of abstractions. Especially since data_tiled_mma_layout is dependent on it. Additionally, having VirtualMMA in MMAAttr also limits the flexibility of adding more stuff and specification to it in the future. In order to resolve this conflict of abstractions, we are migrating VirtualMMAAttr to it's own enums and attr/class. Things that we have changed in this PR: 1. Factor our VMFMA from `MMAAttr` and `MMAIntrinsic` into it's own attr and enums `VirtualMMAAttr ` and `VirtualMMAIntrinsic`. 2. Update places that may use `MMAAttr` or `VirtualMMAAttr` to simply use `MmaInterfaceAttr` such as `KernelConfig`, `LLVMGPUConfigureTensorLayout`, and `AMDGPUDistributeContract`, `GPUNestedDistribution` 3. Move `get[A,B,C]SingleSubgroupLayout` as global method that works on `MmaInterfaceAttr`. --------- Signed-off-by: Stanley Winata <[email protected]> Signed-off-by: Giacomo Serafini <[email protected]>

raikonenfnu requested review from MaheshRavishankar, qedawkins, kuhar, Groverkss and antiagainst as code owners November 7, 2024 01:34

raikonenfnu requested a review from bjacob November 7, 2024 01:34

raikonenfnu force-pushed the moveVirtualMFMAIntrinsic branch from 9561d57 to 757bfb0 Compare November 7, 2024 01:49

[LLVMGPU] Refactor VirtualMMAIntrinsic to it's own class

a34d894

Signed-off-by: Stanley Winata <[email protected]>

raikonenfnu force-pushed the moveVirtualMFMAIntrinsic branch from 757bfb0 to ad92d79 Compare November 7, 2024 02:56

[LLVMGPU] skip distribute contract if mmainterfaceattr

ad92d79

Signed-off-by: Stanley Winata <[email protected]>

bjacob reviewed Nov 7, 2024

View reviewed changes

kuhar reviewed Nov 7, 2024

View reviewed changes

Add asserts and fix NITs with tuple generation

69486d0

Signed-off-by: Stanley Winata <[email protected]>

raikonenfnu requested review from kuhar and bjacob November 7, 2024 17:46

Use global fn instead of member

c80b693

Signed-off-by: Stanley Winata <[email protected]>

kuhar approved these changes Nov 7, 2024

View reviewed changes

bjacob approved these changes Nov 7, 2024

View reviewed changes

nit on LLVM style of ifs and cast

1930c17

Signed-off-by: Stanley Winata <[email protected]>

raikonenfnu merged commit f42b90d into iree-org:main Nov 7, 2024
36 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLVMGPU] Refactor VirtualMMAIntrinsic to it's own attribute and enum. #19055

[LLVMGPU] Refactor VirtualMMAIntrinsic to it's own attribute and enum. #19055

raikonenfnu commented Nov 7, 2024 •

edited

Loading

bjacob left a comment

bjacob Nov 7, 2024

raikonenfnu Nov 7, 2024 •

edited

Loading

raikonenfnu Nov 7, 2024

bjacob Nov 7, 2024

kuhar left a comment

kuhar Nov 7, 2024

bjacob Nov 7, 2024

bjacob Nov 7, 2024 •

edited

Loading

kuhar Nov 7, 2024 •

edited

Loading

kuhar Nov 7, 2024

bjacob Nov 7, 2024

raikonenfnu Nov 7, 2024 •

edited

Loading

bjacob Nov 7, 2024

kuhar Nov 7, 2024

raikonenfnu Nov 7, 2024

kuhar Nov 7, 2024

raikonenfnu Nov 7, 2024

raikonenfnu Nov 7, 2024

kuhar Nov 7, 2024

kuhar Nov 7, 2024

bjacob Nov 7, 2024

raikonenfnu Nov 7, 2024 •

edited

Loading

raikonenfnu Nov 7, 2024

	auto qkIntrinsic = cast<IREE::GPU::MMAAttr>(qkSchedule.getIntrinsic());
	auto pvIntrinsic = cast<IREE::GPU::MMAAttr>(pvSchedule.getIntrinsic());
	IREE::GPU::MMASingleSubgroupLayout lhsLayout =
	pvIntrinsic.getASingleSubgroupLayout();
	IREE::GPU::MMASingleSubgroupLayout rhsLayout =
	pvIntrinsic.getBSingleSubgroupLayout();
	IREE::GPU::MMASingleSubgroupLayout outLayout =
	qkIntrinsic.getCSingleSubgroupLayout();

	return std::make_tuple(VectorType{}, VectorType{}, VectorType{});
	return {};

[LLVMGPU] Refactor VirtualMMAIntrinsic to it's own attribute and enum. #19055

[LLVMGPU] Refactor VirtualMMAIntrinsic to it's own attribute and enum. #19055

Conversation

raikonenfnu commented Nov 7, 2024 • edited Loading

bjacob left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raikonenfnu Nov 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kuhar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bjacob Nov 7, 2024 • edited Loading

Choose a reason for hiding this comment

kuhar Nov 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raikonenfnu Nov 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raikonenfnu Nov 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raikonenfnu commented Nov 7, 2024 •

edited

Loading

raikonenfnu Nov 7, 2024 •

edited

Loading

bjacob Nov 7, 2024 •

edited

Loading

kuhar Nov 7, 2024 •

edited

Loading

raikonenfnu Nov 7, 2024 •

edited

Loading

raikonenfnu Nov 7, 2024 •

edited

Loading