-
Notifications
You must be signed in to change notification settings - Fork 657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LLVMGPU] Refactor VirtualMMAIntrinsic to it's own attribute and enum. #19055
[LLVMGPU] Refactor VirtualMMAIntrinsic to it's own attribute and enum. #19055
Conversation
9561d57
to
757bfb0
Compare
Signed-off-by: Stanley Winata <[email protected]>
757bfb0
to
ad92d79
Compare
Signed-off-by: Stanley Winata <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this on! +1 on the general approach. Question below about changing normal functions to InterfaceMethod
's.
@@ -60,6 +60,48 @@ def IREEGPU_MmaInterfaceAttr : AttrInterface<"MmaInterfaceAttr"> { | |||
/*methodName=*/"getABCVectorTypes", | |||
/*args=*/(ins) | |||
>, | |||
InterfaceMethod< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For every InterfaceMethod
added in any new PR, I ask: why can't this simply be a global function declared in a .h
instead? Two concerns here:
- The C++ debate between member and non-member functions. It's been an evolution in the whole C++ culture (some googling found this post which summarizes a few links) and in the C++ standard library itself (e.g.
vector.begin()
vsstd::begin(vector)
) - In this instance it's also about going through generated code vs not. With
InterfaceMethod
, the indirection through generated code makes it harder to follow the code, both for the human reader, and even for tools likeclangd
which can't tell how many stack frames to walk to when you click "go to implementation".
In this instance, these were existing global functions, so why change that now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this instance, these were existing global functions, so why change that now?
Great question! Firstly, I think these method has always been a member method of MmaAttr, and previously we have always just casted attributes into MmaAttr
and then calling these get[A,B,C]SingleSubgroupLayout
e.g:
iree/compiler/src/iree/compiler/Codegen/LLVMGPU/LLVMGPUConfigureTensorLayouts.cpp
Lines 311 to 318 in 2b1a8e7
auto qkIntrinsic = cast<IREE::GPU::MMAAttr>(qkSchedule.getIntrinsic()); auto pvIntrinsic = cast<IREE::GPU::MMAAttr>(pvSchedule.getIntrinsic()); IREE::GPU::MMASingleSubgroupLayout lhsLayout = pvIntrinsic.getASingleSubgroupLayout(); IREE::GPU::MMASingleSubgroupLayout rhsLayout = pvIntrinsic.getBSingleSubgroupLayout(); IREE::GPU::MMASingleSubgroupLayout outLayout = qkIntrinsic.getCSingleSubgroupLayout(); auto mmaAttr = llvm::cast<MMAAttr>(getIntrinsic()); mmaAttr.getASingleSubgroupLayout());
Now that it can also be VirtualMmaAttr
and since VirtualMmaAttr
and MmaAttr
does not share a base class, the path of least motion is to move these member methods of MmaAttr
into MmaInterfaceAttr
.
That being said, I do agree it is probably safer and cleaner to move these methods as global method, so let me push that up now. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's done, LMK if this is what you have in mind or if you think we can clean it up some more!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point! I didn't remember it was preexisting, but it is. I still think we should, and it's worth the effort, to change existing things that are InterfaceMethod
to be plain C++ instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, just some nits
} | ||
} | ||
// This should not happen but just to make GCC happy. | ||
return std::make_tuple(VectorType{}, VectorType{}, VectorType{}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return std::make_tuple(VectorType{}, VectorType{}, VectorType{}); | |
return {}; |
Also add llvm_ureachable
if it's not supposed to happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kuhar , llvm_unreachable
is a powerful optimization tool, by allowing the compiler to assume a place to be unreachable, but it's also dangerous if it's ever reached, so it only makes sense to use it in very, very performance-critical code. In all other cases, if you want to catch something that's not supposed to happen, maybe you want an assertion (assert(false)
); all the usual objections against assertions ("don't risk crashing the compiler on bad user input") also apply even more strongly against llvm_unreachable
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that here, GCC is saying something very important and Clang's behavior is borderline buggy: these TableGen-generated enums unfortunately have a specified underlying type:
enum class MMAIntrinsic : uint32_t {
Unfortunately, the : uint32_t
makes a big difference here: it makes it legal to form values that aren't any of the enumerator values. So when Clang says "you dont need a default: case in this switch because it would be UB anyway if the enum value were none of the enumerators", that's only correct for enums without a specified underlying type.
Here, unless you want to audit every caller, we have to treat it as a possibility that the enum value is none of the enumerators and so that "error" case is in fact reachable.
Note: https://eel.is/c++draft/expr.static.cast#9 :
If the enumeration type has a fixed underlying type, the value is first converted to that type by integral promotion ([conv.prom]) or integral conversion ([conv.integral]), if necessary, and then to the enumeration type.
If the enumeration type does not have a fixed underlying type, the value is unchanged if the original value is within the range of the enumeration values ([dcl.enum]), and otherwise, the behavior is undefined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I understand it, llvm_unreachable
translates to:
- Debug ==>
assert(false)
- Release ==>
assume(false)
While plain assert
disappears in Release. So to me llvm_unreachable
is fine as a tool to stop the compiler from complaining about missing return values etc., when there's something else that structurally checks that all the cases were handled. And I think this will happen by warnings about unhandled switch
cases.
But I don't care about it much in either direction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replied to the first comment before I saw your second one; assert makes sense to me then. I didn't realize it was an open set of possible values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While plain assert disappears in Release.
Yes. Apparently the idea is that the same control paths are exercised in Debug and Release builds. That assumption seems untenable in practice. Users use Release bugs, and they will run into corner cases that weren't hit by tests and local developer invocations on debug builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! done!
EDIT: Meant this as review comment, please ignore for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed llvm/llvm-project#115345
compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUAttrs.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUAttrs.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUAttrs.td
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUInterfaces.h
Outdated
Show resolved
Hide resolved
Signed-off-by: Stanley Winata <[email protected]>
Signed-off-by: Stanley Winata <[email protected]>
MMASingleSubgroupLayout getASingleSubgroupLayout(MmaInterfaceAttr mmaKind) { | ||
if (auto mmaAttr = llvm::dyn_cast<MMAAttr>(mmaKind)) { | ||
return mmaAttr.getASingleSubgroupLayout(); | ||
} else if (auto vmmaAttr = llvm::dyn_cast<VirtualMMAAttr>(mmaKind)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: no else after return: https://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haven't seen this doc, very cool! will do this, thanks!
compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUAttrs.cpp
Outdated
Show resolved
Hide resolved
//===----------------------------------------------------------------------===// | ||
|
||
MMASingleSubgroupLayout getASingleSubgroupLayout(MmaInterfaceAttr mmaKind) { | ||
if (auto mmaAttr = llvm::dyn_cast<MMAAttr>(mmaKind)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you shouldn't need llvm::
in front of casts? Also everywhere else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I have always thought llvm has a special dyn_cast, is llvm::dyn_cast and dyn_cast the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it is inded the same, is there a reason why most of the code in IREE is llvm::dyn_cast
vs regular dyn_cast
? do we have non uniform name space?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these come from some automatic refactoring script that we ran when migrating from X.dyn_cast<T>()
to dyn_cast<T>(X)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but there should be no reason to add llvm::
in handwritten code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I have always thought llvm has a special dyn_cast, is llvm::dyn_cast and dyn_cast the same?
What else than llvm::dyn_cast
could dyn_cast
be? (What other namespace has a dyn_cast
identifier?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done! although, if you have answers to these Qs would still love to hear it! :)
EDIT: sorry, your previous responses didn't refresh/show up on my screen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What else than llvm::dyn_cast could dyn_cast be? (What other namespace has a dyn_cast identifier?)
Ah, my bad! I was thinking of std C++'s dynamic_cast
compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUAttrs.cpp
Outdated
Show resolved
Hide resolved
Signed-off-by: Stanley Winata <[email protected]>
iree-org#19055) VirtualMMA are variants of MMAOps that has layouts modified through interleaved of reads and/or unrolled-K. These are especially useful for coalescing reads from shared memory to register and for FA to align layouts between the 1st and 2nd matmul. However, the current implementation of virtual MMA lives in MMAAttr and MMAIntrinsic which is a violation of abstractions. Especially since data_tiled_mma_layout is dependent on it. Additionally, having VirtualMMA in MMAAttr also limits the flexibility of adding more stuff and specification to it in the future. In order to resolve this conflict of abstractions, we are migrating VirtualMMAAttr to it's own enums and attr/class. Things that we have changed in this PR: 1. Factor our VMFMA from `MMAAttr` and `MMAIntrinsic` into it's own attr and enums `VirtualMMAAttr ` and `VirtualMMAIntrinsic`. 2. Update places that may use `MMAAttr` or `VirtualMMAAttr` to simply use `MmaInterfaceAttr` such as `KernelConfig`, `LLVMGPUConfigureTensorLayout`, and `AMDGPUDistributeContract`, `GPUNestedDistribution` 3. Move `get[A,B,C]SingleSubgroupLayout` as global method that works on `MmaInterfaceAttr`. --------- Signed-off-by: Stanley Winata <[email protected]>
iree-org#19055) VirtualMMA are variants of MMAOps that has layouts modified through interleaved of reads and/or unrolled-K. These are especially useful for coalescing reads from shared memory to register and for FA to align layouts between the 1st and 2nd matmul. However, the current implementation of virtual MMA lives in MMAAttr and MMAIntrinsic which is a violation of abstractions. Especially since data_tiled_mma_layout is dependent on it. Additionally, having VirtualMMA in MMAAttr also limits the flexibility of adding more stuff and specification to it in the future. In order to resolve this conflict of abstractions, we are migrating VirtualMMAAttr to it's own enums and attr/class. Things that we have changed in this PR: 1. Factor our VMFMA from `MMAAttr` and `MMAIntrinsic` into it's own attr and enums `VirtualMMAAttr ` and `VirtualMMAIntrinsic`. 2. Update places that may use `MMAAttr` or `VirtualMMAAttr` to simply use `MmaInterfaceAttr` such as `KernelConfig`, `LLVMGPUConfigureTensorLayout`, and `AMDGPUDistributeContract`, `GPUNestedDistribution` 3. Move `get[A,B,C]SingleSubgroupLayout` as global method that works on `MmaInterfaceAttr`. --------- Signed-off-by: Stanley Winata <[email protected]>
iree-org#19055) VirtualMMA are variants of MMAOps that has layouts modified through interleaved of reads and/or unrolled-K. These are especially useful for coalescing reads from shared memory to register and for FA to align layouts between the 1st and 2nd matmul. However, the current implementation of virtual MMA lives in MMAAttr and MMAIntrinsic which is a violation of abstractions. Especially since data_tiled_mma_layout is dependent on it. Additionally, having VirtualMMA in MMAAttr also limits the flexibility of adding more stuff and specification to it in the future. In order to resolve this conflict of abstractions, we are migrating VirtualMMAAttr to it's own enums and attr/class. Things that we have changed in this PR: 1. Factor our VMFMA from `MMAAttr` and `MMAIntrinsic` into it's own attr and enums `VirtualMMAAttr ` and `VirtualMMAIntrinsic`. 2. Update places that may use `MMAAttr` or `VirtualMMAAttr` to simply use `MmaInterfaceAttr` such as `KernelConfig`, `LLVMGPUConfigureTensorLayout`, and `AMDGPUDistributeContract`, `GPUNestedDistribution` 3. Move `get[A,B,C]SingleSubgroupLayout` as global method that works on `MmaInterfaceAttr`. --------- Signed-off-by: Stanley Winata <[email protected]> Signed-off-by: Giacomo Serafini <[email protected]>
VirtualMMA are variants of MMAOps that has layouts modified through interleaved of reads and/or unrolled-K. These are especially useful for coalescing reads from shared memory to register and for FA to align layouts between the 1st and 2nd matmul.
However, the current implementation of virtual MMA lives in MMAAttr and MMAIntrinsic which is a violation of abstractions. Especially since data_tiled_mma_layout is dependent on it. Additionally, having VirtualMMA in MMAAttr also limits the flexibility of adding more stuff and specification to it in the future.
In order to resolve this conflict of abstractions, we are migrating VirtualMMAAttr to it's own enums and attr/class.
Things that we have changed in this PR:
MMAAttr
andMMAIntrinsic
into it's own attr and enumsVirtualMMAAttr
andVirtualMMAIntrinsic
.MMAAttr
orVirtualMMAAttr
to simply useMmaInterfaceAttr
such asKernelConfig
,LLVMGPUConfigureTensorLayout
, andAMDGPUDistributeContract
,GPUNestedDistribution
get[A,B,C]SingleSubgroupLayout
as an global method S.T we can cast asMmaInterfaceAttr
in places that may useMMAAttr
andVirtualMMAAttr
.