Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Codegen][GPU] Add kernel config for LLVMGPUTileAndFuse #17791

Merged
merged 5 commits into from
Aug 17, 2024

Conversation

qedawkins
Copy link
Contributor

This adds kernel configuration logic for targeting simple thread distribution of linalg-based dispatches on LLVMGPU. The configuration logic is primarily copied from the same logic on the SPIR-V side due to the already well tested heuristics there for the kinds of varied target descriptions that are present on the SPIR-V side.

Currently this is locked behind a flag
iree-codegen-llvmgpu-use-tile-and-fuse. Future patches will add specialized logic for matmul.

@qedawkins qedawkins force-pushed the simt_kernel_config branch 2 times, most recently from 55a8f7e to b180ec1 Compare August 8, 2024 20:20
@qedawkins qedawkins marked this pull request as ready for review August 9, 2024 17:47
@qedawkins qedawkins requested a review from Max191 August 9, 2024 17:47
Copy link
Contributor

@Max191 Max191 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some testing with this, and there are some compiler failures and correctness issues with some dispatches. We should debug these issues before we land this change.

@qedawkins
Copy link
Contributor Author

I did some testing with this, and there are some compiler failures and correctness issues with some dispatches. We should debug these issues before we land this change.

Talked offline, the correctness issues looked to be a floating point precision ghost. I verified e2e correctness of this patch on SDXL int8 by generating an image.

@qedawkins qedawkins requested review from Max191 and kuhar August 15, 2024 14:43
Copy link
Contributor

@Max191 Max191 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to land. There is a similar issue with VectorDistribute fusions, which may be causing minor precision differences. I talked with @MaheshRavishankar and we still want to track this precision difference. Ideally these precision differences should be able to be truned off if necessary to produce bitwise exact results independent of pipelines. Not blocking progress, since numerics are overall accurate, but we should track this.

This adds kernel configuration logic for targeting simple thread
distribution of linalg-based dispatches on LLVMGPU. The configuration
logic is primarily copied from the same logic on the SPIR-V side due to
the already well tested heuristics there for the kinds of varied target
descriptions that are present for SPIR-V.

Currently this is locked behind a flag `iree-codegen-llvmgpu-use-tile-and-fuse`.
@qedawkins qedawkins merged commit 10ba28d into iree-org:main Aug 17, 2024
35 of 36 checks passed
@qedawkins qedawkins deleted the simt_kernel_config branch August 17, 2024 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants