Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLVMGPU] Refactor VirtualMMAIntrinsic to it's own attribute and enum. #19055

Merged
merged 5 commits into from
Nov 7, 2024

Conversation

raikonenfnu
Copy link
Collaborator

@raikonenfnu raikonenfnu commented Nov 7, 2024

VirtualMMA are variants of MMAOps that has layouts modified through interleaved of reads and/or unrolled-K. These are especially useful for coalescing reads from shared memory to register and for FA to align layouts between the 1st and 2nd matmul.

However, the current implementation of virtual MMA lives in MMAAttr and MMAIntrinsic which is a violation of abstractions. Especially since data_tiled_mma_layout is dependent on it. Additionally, having VirtualMMA in MMAAttr also limits the flexibility of adding more stuff and specification to it in the future.

In order to resolve this conflict of abstractions, we are migrating VirtualMMAAttr to it's own enums and attr/class.

Things that we have changed in this PR:

  1. Factor our VMFMA from MMAAttr and MMAIntrinsic into it's own attr and enums VirtualMMAAttr and VirtualMMAIntrinsic.
  2. Update places that may use MMAAttr or VirtualMMAAttr to simply use MmaInterfaceAttr such as KernelConfig, LLVMGPUConfigureTensorLayout, and AMDGPUDistributeContract, GPUNestedDistribution
  3. Move get[A,B,C]SingleSubgroupLayout as an global method S.T we can cast as MmaInterfaceAttr in places that may use MMAAttr and VirtualMMAAttr.

@raikonenfnu raikonenfnu force-pushed the moveVirtualMFMAIntrinsic branch from 757bfb0 to ad92d79 Compare November 7, 2024 02:56
Copy link
Contributor

@bjacob bjacob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on! +1 on the general approach. Question below about changing normal functions to InterfaceMethod's.

@@ -60,6 +60,48 @@ def IREEGPU_MmaInterfaceAttr : AttrInterface<"MmaInterfaceAttr"> {
/*methodName=*/"getABCVectorTypes",
/*args=*/(ins)
>,
InterfaceMethod<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For every InterfaceMethod added in any new PR, I ask: why can't this simply be a global function declared in a .h instead? Two concerns here:

  1. The C++ debate between member and non-member functions. It's been an evolution in the whole C++ culture (some googling found this post which summarizes a few links) and in the C++ standard library itself (e.g. vector.begin() vs std::begin(vector))
  2. In this instance it's also about going through generated code vs not. With InterfaceMethod, the indirection through generated code makes it harder to follow the code, both for the human reader, and even for tools like clangd which can't tell how many stack frames to walk to when you click "go to implementation".

In this instance, these were existing global functions, so why change that now?

Copy link
Collaborator Author

@raikonenfnu raikonenfnu Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this instance, these were existing global functions, so why change that now?

Great question! Firstly, I think these method has always been a member method of MmaAttr, and previously we have always just casted attributes into MmaAttr and then calling these get[A,B,C]SingleSubgroupLayout

e.g:

  1. auto qkIntrinsic = cast<IREE::GPU::MMAAttr>(qkSchedule.getIntrinsic());
    auto pvIntrinsic = cast<IREE::GPU::MMAAttr>(pvSchedule.getIntrinsic());
    IREE::GPU::MMASingleSubgroupLayout lhsLayout =
    pvIntrinsic.getASingleSubgroupLayout();
    IREE::GPU::MMASingleSubgroupLayout rhsLayout =
    pvIntrinsic.getBSingleSubgroupLayout();
    IREE::GPU::MMASingleSubgroupLayout outLayout =
    qkIntrinsic.getCSingleSubgroupLayout();
  2. auto mmaAttr = llvm::cast<MMAAttr>(getIntrinsic());
    mmaAttr.getASingleSubgroupLayout());

Now that it can also be VirtualMmaAttr and since VirtualMmaAttr and MmaAttr does not share a base class, the path of least motion is to move these member methods of MmaAttr into MmaInterfaceAttr.

That being said, I do agree it is probably safer and cleaner to move these methods as global method, so let me push that up now. :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's done, LMK if this is what you have in mind or if you think we can clean it up some more!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point! I didn't remember it was preexisting, but it is. I still think we should, and it's worth the effort, to change existing things that are InterfaceMethod to be plain C++ instead.

Copy link
Member

@kuhar kuhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, just some nits

}
}
// This should not happen but just to make GCC happy.
return std::make_tuple(VectorType{}, VectorType{}, VectorType{});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return std::make_tuple(VectorType{}, VectorType{}, VectorType{});
return {};

Also add llvm_ureachable if it's not supposed to happen?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kuhar , llvm_unreachable is a powerful optimization tool, by allowing the compiler to assume a place to be unreachable, but it's also dangerous if it's ever reached, so it only makes sense to use it in very, very performance-critical code. In all other cases, if you want to catch something that's not supposed to happen, maybe you want an assertion (assert(false)); all the usual objections against assertions ("don't risk crashing the compiler on bad user input") also apply even more strongly against llvm_unreachable.

Copy link
Contributor

@bjacob bjacob Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that here, GCC is saying something very important and Clang's behavior is borderline buggy: these TableGen-generated enums unfortunately have a specified underlying type:

enum class MMAIntrinsic : uint32_t {

Unfortunately, the : uint32_t makes a big difference here: it makes it legal to form values that aren't any of the enumerator values. So when Clang says "you dont need a default: case in this switch because it would be UB anyway if the enum value were none of the enumerators", that's only correct for enums without a specified underlying type.

Here, unless you want to audit every caller, we have to treat it as a possibility that the enum value is none of the enumerators and so that "error" case is in fact reachable.

Note: https://eel.is/c++draft/expr.static.cast#9 :

If the enumeration type has a fixed underlying type, the value is first converted to that type by integral promotion ([conv.prom]) or integral conversion ([conv.integral]), if necessary, and then to the enumeration type.
If the enumeration type does not have a fixed underlying type, the value is unchanged if the original value is within the range of the enumeration values ([dcl.enum]), and otherwise, the behavior is undefined.

Copy link
Member

@kuhar kuhar Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I understand it, llvm_unreachable translates to:

  • Debug ==> assert(false)
  • Release ==> assume(false)

While plain assert disappears in Release. So to me llvm_unreachable is fine as a tool to stop the compiler from complaining about missing return values etc., when there's something else that structurally checks that all the cases were handled. And I think this will happen by warnings about unhandled switch cases.

But I don't care about it much in either direction

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replied to the first comment before I saw your second one; assert makes sense to me then. I didn't realize it was an open set of possible values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While plain assert disappears in Release.

Yes. Apparently the idea is that the same control paths are exercised in Debug and Release builds. That assumption seems untenable in practice. Users use Release bugs, and they will run into corner cases that weren't hit by tests and local developer invocations on debug builds.

Copy link
Collaborator Author

@raikonenfnu raikonenfnu Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! done!

EDIT: Meant this as review comment, please ignore for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raikonenfnu raikonenfnu requested review from kuhar and bjacob November 7, 2024 17:46
Signed-off-by: Stanley Winata <[email protected]>
MMASingleSubgroupLayout getASingleSubgroupLayout(MmaInterfaceAttr mmaKind) {
if (auto mmaAttr = llvm::dyn_cast<MMAAttr>(mmaKind)) {
return mmaAttr.getASingleSubgroupLayout();
} else if (auto vmmaAttr = llvm::dyn_cast<VirtualMMAAttr>(mmaKind)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haven't seen this doc, very cool! will do this, thanks!

//===----------------------------------------------------------------------===//

MMASingleSubgroupLayout getASingleSubgroupLayout(MmaInterfaceAttr mmaKind) {
if (auto mmaAttr = llvm::dyn_cast<MMAAttr>(mmaKind)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you shouldn't need llvm:: in front of casts? Also everywhere else

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I have always thought llvm has a special dyn_cast, is llvm::dyn_cast and dyn_cast the same?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it is inded the same, is there a reason why most of the code in IREE is llvm::dyn_cast vs regular dyn_cast? do we have non uniform name space?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these come from some automatic refactoring script that we ran when migrating from X.dyn_cast<T>() to dyn_cast<T>(X)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but there should be no reason to add llvm:: in handwritten code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I have always thought llvm has a special dyn_cast, is llvm::dyn_cast and dyn_cast the same?

What else than llvm::dyn_cast could dyn_cast be? (What other namespace has a dyn_cast identifier?)

Copy link
Collaborator Author

@raikonenfnu raikonenfnu Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done! although, if you have answers to these Qs would still love to hear it! :)

EDIT: sorry, your previous responses didn't refresh/show up on my screen

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What else than llvm::dyn_cast could dyn_cast be? (What other namespace has a dyn_cast identifier?)

Ah, my bad! I was thinking of std C++'s dynamic_cast

@raikonenfnu raikonenfnu merged commit f42b90d into iree-org:main Nov 7, 2024
36 checks passed
JamesMBartlett pushed a commit to gimletlabs/iree that referenced this pull request Nov 8, 2024
iree-org#19055)

VirtualMMA are variants of MMAOps that has layouts modified through
interleaved of reads and/or unrolled-K. These are especially useful for
coalescing reads from shared memory to register and for FA to align
layouts between the 1st and 2nd matmul.

However, the current implementation of virtual MMA lives in MMAAttr and
MMAIntrinsic which is a violation of abstractions. Especially since
data_tiled_mma_layout is dependent on it. Additionally, having
VirtualMMA in MMAAttr also limits the flexibility of adding more stuff
and specification to it in the future.

In order to resolve this conflict of abstractions, we are migrating
VirtualMMAAttr to it's own enums and attr/class.

Things that we have changed in this PR:

1. Factor our VMFMA from `MMAAttr` and `MMAIntrinsic` into it's own attr
and enums `VirtualMMAAttr ` and `VirtualMMAIntrinsic`.
2. Update places that may use `MMAAttr` or `VirtualMMAAttr` to simply
use `MmaInterfaceAttr` such as `KernelConfig`,
`LLVMGPUConfigureTensorLayout`, and `AMDGPUDistributeContract`,
`GPUNestedDistribution`
3. Move `get[A,B,C]SingleSubgroupLayout` as global method 
that works on `MmaInterfaceAttr`.

---------

Signed-off-by: Stanley Winata <[email protected]>
Groverkss pushed a commit to Groverkss/iree that referenced this pull request Dec 1, 2024
iree-org#19055)

VirtualMMA are variants of MMAOps that has layouts modified through
interleaved of reads and/or unrolled-K. These are especially useful for
coalescing reads from shared memory to register and for FA to align
layouts between the 1st and 2nd matmul.

However, the current implementation of virtual MMA lives in MMAAttr and
MMAIntrinsic which is a violation of abstractions. Especially since
data_tiled_mma_layout is dependent on it. Additionally, having
VirtualMMA in MMAAttr also limits the flexibility of adding more stuff
and specification to it in the future.

In order to resolve this conflict of abstractions, we are migrating
VirtualMMAAttr to it's own enums and attr/class.

Things that we have changed in this PR:

1. Factor our VMFMA from `MMAAttr` and `MMAIntrinsic` into it's own attr
and enums `VirtualMMAAttr ` and `VirtualMMAIntrinsic`.
2. Update places that may use `MMAAttr` or `VirtualMMAAttr` to simply
use `MmaInterfaceAttr` such as `KernelConfig`,
`LLVMGPUConfigureTensorLayout`, and `AMDGPUDistributeContract`,
`GPUNestedDistribution`
3. Move `get[A,B,C]SingleSubgroupLayout` as global method 
that works on `MmaInterfaceAttr`.

---------

Signed-off-by: Stanley Winata <[email protected]>
giacs-epic pushed a commit to giacs-epic/iree that referenced this pull request Dec 4, 2024
iree-org#19055)

VirtualMMA are variants of MMAOps that has layouts modified through
interleaved of reads and/or unrolled-K. These are especially useful for
coalescing reads from shared memory to register and for FA to align
layouts between the 1st and 2nd matmul.

However, the current implementation of virtual MMA lives in MMAAttr and
MMAIntrinsic which is a violation of abstractions. Especially since
data_tiled_mma_layout is dependent on it. Additionally, having
VirtualMMA in MMAAttr also limits the flexibility of adding more stuff
and specification to it in the future.

In order to resolve this conflict of abstractions, we are migrating
VirtualMMAAttr to it's own enums and attr/class.

Things that we have changed in this PR:

1. Factor our VMFMA from `MMAAttr` and `MMAIntrinsic` into it's own attr
and enums `VirtualMMAAttr ` and `VirtualMMAIntrinsic`.
2. Update places that may use `MMAAttr` or `VirtualMMAAttr` to simply
use `MmaInterfaceAttr` such as `KernelConfig`,
`LLVMGPUConfigureTensorLayout`, and `AMDGPUDistributeContract`,
`GPUNestedDistribution`
3. Move `get[A,B,C]SingleSubgroupLayout` as global method
that works on `MmaInterfaceAttr`.

---------

Signed-off-by: Stanley Winata <[email protected]>
Signed-off-by: Giacomo Serafini <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants