-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generate_loop_schedule_v2 #350
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts from a quick look here.
loopy/schedule/__init__.py
Outdated
|
||
iname_to_insns = kernel.iname_to_insns() | ||
loop_nest_around_map = defaultdict(frozenset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my view, the loop_nest_around_map
should go. The loop nest tree is a strictly more informative data structure, and it is not of size O(n^2). I think we should get rid of it. I looked through the uses, and it shouldn't be hard to replace, and it'll fix another bottleneck in linearization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the new version, I didn't touch the old implementation (v1-scheduler) to keep things simpler in this PR. So, for such cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can handle it as a separate PR.
loopy/schedule/tools.py
Outdated
Updates *tree* to make *inames_to_pull_out* a loop nesting level in | ||
*loop_nests* | ||
|
||
:returns: a :class:`tuple` ``(outer_loop_nest, inner_loop_nest)``, where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Describe the types of the arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done.
loopy/schedule/tools.py
Outdated
Updates *tree* to make *inames_to_pull_out* a loop nesting level in | ||
*loop_nests* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Describe which is nested "inside" and which is "outside".
96856f8
to
0203d09
Compare
Does #372 supersede this? |
e14a03d
to
2d526dc
Compare
e8b03c8
to
72334e8
Compare
6945d16
to
d74da04
Compare
d74da04
to
1aef04f
Compare
1aef04f
to
581e82e
Compare
e20c6ca
to
931c6e9
Compare
931c6e9
to
36a1a2f
Compare
f10f24c
to
c1f8e0d
Compare
c1f8e0d
to
8c42d10
Compare
8c42d10
to
15cea20
Compare
15cea20
to
5b55bcd
Compare
5b55bcd
to
f8dc959
Compare
f8dc959
to
80425bd
Compare
I was just rebasing this. You beat me to it! :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a first look. I haven't really gotten very far into the core algorithm, because I'm not sure I understood _pull_out_loop_nest_tree
, which seems like it's a core operation.
@@ -930,6 +933,225 @@ def _get_persistent_hashable_arg(arg): | |||
|
|||
return wrapper | |||
|
|||
|
|||
# {{{ tree data structure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is big enough to be its own file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#694 puts the Tree
class into its own file (loopy/schedule/tree.py
).
@@ -930,6 +933,225 @@ def _get_persistent_hashable_arg(arg): | |||
|
|||
return wrapper | |||
|
|||
|
|||
# {{{ tree data structure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this has type annotations, mypy should like them (and there should be CI saying as much).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can see, mypy checks and passes the annotations.
@dataclass(frozen=True) | ||
class Tree(Generic[T]): | ||
""" | ||
An immutable tree implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Describe the role of the type variable T
. Specifically describe that there's one Tree
object, but many nodes T
. Maybe rename T
to NodeT
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c53a9af hopefully clarifies this (and renames T
to NodeT
).
|
||
|
||
@dataclass(frozen=True) | ||
class Tree(Generic[T]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this mature enough to be in pytools?
loopy/tools.py
Outdated
_parent_to_children: "PMap[T, FrozenSet[T]]" | ||
_child_to_parent: "PMap[T, Optional[T]]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of data structure, this is a forest (i.e. it allows multiple trees). It wouldn't take much to allow that fully... YAGNI though.
|
||
loop_nests = sorted(loop_nests, key=lambda nest: tree.depth(nest)) | ||
|
||
for outer, inner in zip(loop_nests[:-1], loop_nests[1:]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that all you'd need to pass here is the innermost node in the loop nest tree, and tree.ancestors
(at least the proposed changed version) would cheaply give you this without the need to enforce these invariants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Finally made it all the way through. There's a bit to digest here, but as far as I remember, it's mostly superficial. I didn't find any flaws in the thinking. Nice job!
if outer_loop == "": | ||
continue | ||
|
||
for child in loop_nest_tree.children(outer_loop): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment that this only works because loop_nest_tree
contains a total order of loops. Otherwise it might windup with mismatched enter/leaves.
Unsubscribing... @-mention or request review once it's ready for a look or needs attention. But I'm eager to get this in soon! |
65a6b62
to
c21c092
Compare
c21c092
to
fadb652
Compare
Most review comments from here are addressed in #864. |
Implementation for finding loop nest around map in O(N.k), 'N' being the number of inames and 'k' being the max. loop depth.
For comparison, let's consider the kernel in #288: on
main
this map in computed in 5 minutes and this branch takes300.4 seconds.