Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLVMCPU] Add an additional level of tiling #19027

Merged
merged 5 commits into from
Nov 12, 2024
Merged

Conversation

pashu123
Copy link
Contributor

@pashu123 pashu123 commented Nov 5, 2024

Add an additional level of tiling, aka vector parallel to the CPU default pipeline. Some of the linalg op that is not specialized through any pipeline may hit a bufferization issue if passed through the default pipeline. Adding an extra level of tiling takes care of such cases.

Removes some ops (disabled for producer fusion) from dispatch Region creation. They were added in #18777 . For more info: #18900

Comment on lines 2510 to 2517
// Add an extra level of tiling.
if (auto linalgOp = dyn_cast<linalg::LinalgOp>(*op)) {
SmallVector<int64_t> vecTileSizes = distTileSizes;
limitVectorTileSizes(linalgOp, vecTileSizes);
tileSizes.push_back(vecTileSizes);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll need the second level of tiling on any TilingInterface ops, so please add a TODO. It is the key to get smaller problem size; you don't need the special logics in Passes.cpp. What I'd do is something like below

Suggested change
// Add an extra level of tiling.
if (auto linalgOp = dyn_cast<linalg::LinalgOp>(*op)) {
SmallVector<int64_t> vecTileSizes = distTileSizes;
limitVectorTileSizes(linalgOp, vecTileSizes);
tileSizes.push_back(vecTileSizes);
}
SmallVector<int64_t> vecTileSizes = distTileSizes;
// TODO: ...
if (auto linalgOp = dyn_cast<linalg::LinalgOp>(*op)) {
limitVectorTileSizes(linalgOp, vecTileSizes);
}
tileSizes.push_back(vecTileSizes);

@@ -653,8 +653,14 @@ void addCPULinalgExtTileAndVectorizePipeline(
}
}

void addCPUDefaultPassPipeline(OpPassManager &funcPassManager) {
void addCPUDefaultPassPipeline(OpPassManager &funcPassManager,
FailureOr<TilingConfig> &tilingConfig) {
addTileAndDistributePasses(funcPassManager);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not make sense to add these passes when the TilingConfig is not present, so perhaps we should just move it into the if-body?

Here is the context for other reviewers, there are some dispatches that do not have any TilingInterface ops, so we can't find any TilingConfig on any op. I think we still need the bufferization because of the flow/hal binding ops. This is one such examples: https://gist.github.com/pashu123/0b5329e8a60311b634ce9d2230381b20

@pashu123 pashu123 requested a review from hanhanW November 6, 2024 13:33
Copy link
Contributor

@MaheshRavishankar MaheshRavishankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a test here? It seems to me that we are using the same tile sizes for distribution and vector size. Want to verify that this isnt the case.

@hanhanW
Copy link
Contributor

hanhanW commented Nov 6, 2024

Could we add a test here? It seems to me that we are using the same tile sizes for distribution and vector size. Want to verify that this isnt the case.

Yes, we're using the same tile sizes for distribution and vector sizes in this PR. It happens in non-linalg ops. For linalg op cases, the vector sizes are reduced based on the heuristic (by looking at indexing maps and limit tile sizes for some dimensions).

@pashu123 can you add some tests to https://github.com/iree-org/iree/blob/main/compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_x86_64_lowering_strategy.mlir ? We can add a test case for a group convolution and a test case for whatever LinalgExt ops (e.g., scan or topk or others).

Copy link
Contributor

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me. I made a bad review comment, and I put an update for it. Please take a look. (and sorry about that)

@pashu123 pashu123 force-pushed the newtiling branch 2 times, most recently from c133b54 to 864d9df Compare November 10, 2024 17:53
Add an addition level of tiling aka vector parallel to the CPU default
pipeline. For some of the linalg op, that is not specialized through any
pipeline may hit bufferization issue, if passed through the default
pipeline. Adding an extra level of tiling takes care of such cases.
@pashu123 pashu123 force-pushed the newtiling branch 2 times, most recently from c133b54 to e19f918 Compare November 10, 2024 17:55
@pashu123 pashu123 requested a review from hanhanW November 10, 2024 18:03
Copy link
Contributor

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a test here? It seems to me that we are using the same tile sizes for distribution and vector size. Want to verify that this isnt the case.

Overall looks good to me, just one final nit in the test file.

@MaheshRavishankar do you still have the concern? We have different vector sizes for Linalg ops, but we have the same vector sizes for other TilingInterface ops (which has a TODO item to improve it). It is true that the second tile size list is redundant for other TilingInterface ops for now, but it simplifies how we set up the pipeline (e.g., we don't need any special logics when adding passes to the pipeline).

@pashu123 pashu123 merged commit da286ea into iree-org:main Nov 12, 2024
36 checks passed
Groverkss pushed a commit to Groverkss/iree that referenced this pull request Dec 1, 2024
Add an additional level of tiling, aka vector parallel to the CPU
default pipeline. Some of the linalg op that is not specialized through
any pipeline may hit a bufferization issue if passed through the default
pipeline. Adding an extra level of tiling takes care of such cases.

Removes some ops (disabled for producer fusion) from dispatch Region
creation. They were added in iree-org#18777
. For more info: iree-org#18900
giacs-epic pushed a commit to giacs-epic/iree that referenced this pull request Dec 4, 2024
Add an additional level of tiling, aka vector parallel to the CPU
default pipeline. Some of the linalg op that is not specialized through
any pipeline may hit a bufferization issue if passed through the default
pipeline. Adding an extra level of tiling takes care of such cases.

Removes some ops (disabled for producer fusion) from dispatch Region
creation. They were added in iree-org#18777
. For more info: iree-org#18900

Signed-off-by: Giacomo Serafini <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants