-
Notifications
You must be signed in to change notification settings - Fork 645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPU] Use scf.if for forall overhangs #19125
base: main
Are you sure you want to change the base?
Conversation
c3eabac
to
5207e94
Compare
247b62e
to
94e406d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you are doing here... it sort of makes sense... but there is some nuance here that @qedawkins was explaining to me some time. I'd wait for his review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you're going for but this feels like it might be a premature optimization to me. This means every time we resolve a forall we're duplicating the full body of the loop. This makes sense for simple cases like copies, but is probably overkill for larger loops. In other words, I think the decision to do this is tied to whether we want to unroll the loop.
Before landing this change, I'd like at least some signal that it's better. We can try using the ONNX model suite for this, there are some instructions for how to run it here
This will give us a report comparing model performance for a few models + overall model support numbers that can hopefully give some kind of signal.
Yeah, agreed that this is ... I could probably skip the |
Yeah the use of linearize SGTM |
@qedawkins That page you linked 404s for me. Do I need to get added to something? |
Oh shoot, yeah, let me remove that. Let's follow up offline because someone else will need to add you. |
5328767
to
5a8fa83
Compare
5a8fa83
to
291f570
Compare
In cases where we can't determine if the number of workitems per workgroup evenly divides the set of items that's required for an scf.forall, the current code uses `scf.for %i = %id to %upperBound step %numWorkitems` in order to make the last loop iteration only run on the expected fraction of wworkitems. This commit enables using linearize (and a step-1 loop) in the main body of the for loop the forall is being lowered to by switching to a post-loop if statement instead.
94e406d
to
9368746
Compare
In cases where we can't determine if the number of workitems per workgroup evenly divides the set of items that's required for an scf.forall, the current code uses
scf.for %i = %id to %upperBound step %numWorkitems
in order to make the last loop iteration only run on the expected fraction of wworkitems.This commit enables using linearize (and a step-1 loop) in the main body of the for loop the forall is being lowered to by switching to a post-loop if statement instead.