-
Notifications
You must be signed in to change notification settings - Fork 648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LLVMCPU] Add an additional level of tiling #19027
Conversation
399fadd
to
ba3344f
Compare
// Add an extra level of tiling. | ||
if (auto linalgOp = dyn_cast<linalg::LinalgOp>(*op)) { | ||
SmallVector<int64_t> vecTileSizes = distTileSizes; | ||
limitVectorTileSizes(linalgOp, vecTileSizes); | ||
tileSizes.push_back(vecTileSizes); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll need the second level of tiling on any TilingInterface ops, so please add a TODO. It is the key to get smaller problem size; you don't need the special logics in Passes.cpp
. What I'd do is something like below
// Add an extra level of tiling. | |
if (auto linalgOp = dyn_cast<linalg::LinalgOp>(*op)) { | |
SmallVector<int64_t> vecTileSizes = distTileSizes; | |
limitVectorTileSizes(linalgOp, vecTileSizes); | |
tileSizes.push_back(vecTileSizes); | |
} | |
SmallVector<int64_t> vecTileSizes = distTileSizes; | |
// TODO: ... | |
if (auto linalgOp = dyn_cast<linalg::LinalgOp>(*op)) { | |
limitVectorTileSizes(linalgOp, vecTileSizes); | |
} | |
tileSizes.push_back(vecTileSizes); |
@@ -653,8 +653,14 @@ void addCPULinalgExtTileAndVectorizePipeline( | |||
} | |||
} | |||
|
|||
void addCPUDefaultPassPipeline(OpPassManager &funcPassManager) { | |||
void addCPUDefaultPassPipeline(OpPassManager &funcPassManager, | |||
FailureOr<TilingConfig> &tilingConfig) { | |||
addTileAndDistributePasses(funcPassManager); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not make sense to add these passes when the TilingConfig is not present, so perhaps we should just move it into the if-body?
Here is the context for other reviewers, there are some dispatches that do not have any TilingInterface ops, so we can't find any TilingConfig on any op. I think we still need the bufferization because of the flow/hal binding ops. This is one such examples: https://gist.github.com/pashu123/0b5329e8a60311b634ce9d2230381b20
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a test here? It seems to me that we are using the same tile sizes for distribution and vector size. Want to verify that this isnt the case.
Yes, we're using the same tile sizes for distribution and vector sizes in this PR. It happens in non-linalg ops. For linalg op cases, the vector sizes are reduced based on the heuristic (by looking at indexing maps and limit tile sizes for some dimensions). @pashu123 can you add some tests to https://github.com/iree-org/iree/blob/main/compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_x86_64_lowering_strategy.mlir ? We can add a test case for a group convolution and a test case for whatever LinalgExt ops (e.g., scan or topk or others). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me. I made a bad review comment, and I put an update for it. Please take a look. (and sorry about that)
compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_x86_64_lowering_strategy.mlir
Outdated
Show resolved
Hide resolved
c133b54
to
864d9df
Compare
Add an addition level of tiling aka vector parallel to the CPU default pipeline. For some of the linalg op, that is not specialized through any pipeline may hit bufferization issue, if passed through the default pipeline. Adding an extra level of tiling takes care of such cases.
c133b54
to
e19f918
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a test here? It seems to me that we are using the same tile sizes for distribution and vector size. Want to verify that this isnt the case.
Overall looks good to me, just one final nit in the test file.
@MaheshRavishankar do you still have the concern? We have different vector sizes for Linalg ops, but we have the same vector sizes for other TilingInterface ops (which has a TODO item to improve it). It is true that the second tile size list is redundant for other TilingInterface ops for now, but it simplifies how we set up the pipeline (e.g., we don't need any special logics when adding passes to the pipeline).
compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_x86_64_lowering_strategy.mlir
Outdated
Show resolved
Hide resolved
Add an additional level of tiling, aka vector parallel to the CPU default pipeline. Some of the linalg op that is not specialized through any pipeline may hit a bufferization issue if passed through the default pipeline. Adding an extra level of tiling takes care of such cases. Removes some ops (disabled for producer fusion) from dispatch Region creation. They were added in iree-org#18777 . For more info: iree-org#18900
Add an additional level of tiling, aka vector parallel to the CPU default pipeline. Some of the linalg op that is not specialized through any pipeline may hit a bufferization issue if passed through the default pipeline. Adding an extra level of tiling takes care of such cases. Removes some ops (disabled for producer fusion) from dispatch Region creation. They were added in iree-org#18777 . For more info: iree-org#18900 Signed-off-by: Giacomo Serafini <[email protected]>
Add an additional level of tiling, aka vector parallel to the CPU default pipeline. Some of the linalg op that is not specialized through any pipeline may hit a bufferization issue if passed through the default pipeline. Adding an extra level of tiling takes care of such cases.
Removes some ops (disabled for producer fusion) from dispatch Region creation. They were added in #18777 . For more info: #18900