Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Display::Contents (tree-side implementation) #534

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nicoburns
Copy link
Collaborator

@nicoburns nicoburns commented Sep 11, 2023

Objective

Closes #533
Can be used to build bevyengine/bevy#9731

Todo

  • Could probably do with more tests before merging. Although due to the nature of the implementation, this features coupling with specific layout modes is low, and it's unlikely that some would work while others don't. So we probably don't need loads of tests (we should probably have at least some for each layout mode though).
  • Update release notes

Feedback wanted

  • This code includes a Vec that is allocated every time a node's children are iterated over, and which will almost always of length 1 (any non-display: contents will result in a Vec that only ever has length 1). How would people feel about depending on the smallvec crate to optimise this case?

@nicoburns nicoburns added the enhancement New feature or request label Sep 11, 2023
@nicoburns nicoburns added this to the 0.4 milestone Sep 11, 2023
@nicoburns nicoburns changed the title Display contents Implement Display::Contents Sep 11, 2023
Copy link
Collaborator

@TimJentzsch TimJentzsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well-motivated change and well-documented implementation.
I think the tests you provided are sufficient for now.

Regarding the vec allocations, I think your feedback wanted section is cut off at the end.
But perhaps an enum can be used to avoid creating a vec if we don't need to? This would probably make the implementation more complex though, so if this isn't degrading the performance too much, I vote that we leave it as-is for now.
Maybe we can run a benchmark before merging this to check if this regressed performance.

@nicoburns
Copy link
Collaborator Author

@TimJentzsch

Regarding the vec allocations, I think your feedback wanted section is cut off at the end.

Ah, my question was going to be about depending on the smallvec crate for this optimisation. I've updated the original post! Benchmarks are probably a good idea.

@alice-i-cecile
Copy link
Collaborator

I'm in favor of pulling in smallvec, assuming the benchmarks actually see a difference. It's a very useful, common crate with a low footprint and we're very unlikely to be near the choke point of most program's dependency tree.

Copy link
Collaborator

@alice-i-cecile alice-i-cecile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to ship this for now: we can do the smallvec change in a follow-up PR.

@nicoburns
Copy link
Collaborator Author

Unfortunately the benchmark results for this change are not good. So I think this will have to wait on a more efficient implementation.

@nicoburns nicoburns marked this pull request as draft September 14, 2023 10:45
@nicoburns nicoburns added blocked Cannot be advanced until something else changes and removed blocked Cannot be advanced until something else changes labels Sep 14, 2023
@nicoburns nicoburns force-pushed the display-contents branch 3 times, most recently from d270760 to eda4487 Compare September 25, 2023 20:52
@nicoburns nicoburns marked this pull request as ready for review September 25, 2023 21:08
@nicoburns
Copy link
Collaborator Author

@alice-i-cecile @TimJentzsch Requesting a re-review as I've had to re-implement this to fix the perf issues. This new implementation is ~5% slower than main (actually wavering between 0% and 10% depending on the benchmark and the run). Which seems pretty good for what it's doing to me. The trick is to collect the resolved children in a Vec, and then cache them for future accesses:

  • I had to use a RefCel to avoid making the children method require &mut (this seems like an appropriate use of refcell to me, and it doesn't appear to be causing perf issues. But I suppose it might cause problems if we ever go multithreaded?)
  • I tried a cache-per-node (stored on the node), but that was noticeably slower.
  • I'm not actually benchmarking nodes with Display::Contents set here. This is the extra cost for regular nodes. I would anticipate Display::Contents nodes not actually being that different. There's extra work for the parent, but then you get to skip a node, so it probably balances?
  • I haven't put this behind a feature flag, but I could. Perhaps that would be good so that people who don't need this feature could avoid the perf cost? (even though it's relatively small)

src/tree/taffy_tree/tree.rs Outdated Show resolved Hide resolved
src/tree/taffy_tree/tree.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@TimJentzsch TimJentzsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good enough for merging, we can probably iterate on the design later on if needed for additional performance :)

I'd only suggest to gate it behind a feature flag if it doesn't degrade the code quality too much, otherwise I think it's not worth it for now.

src/tree/taffy_tree/tree.rs Show resolved Hide resolved
@nicoburns nicoburns marked this pull request as draft September 26, 2023 20:59
@nicoburns
Copy link
Collaborator Author

Have just realised I also need to fix the .child and .child_count methods to use the new children logic. Will also add tests for this seeing as the current broken code is passing the existing ones!

@nicoburns nicoburns removed this from the 0.4 milestone Oct 12, 2023
@nicoburns nicoburns force-pushed the display-contents branch 4 times, most recently from b937a82 to 5ab387e Compare October 29, 2023 18:21
Add Display::Contents variant to Display enum

Modify Taffy children iterator to recurse into Display::Contents nodes

Add `Display::Contents` support to the gentest script

Add basic tests for `Display::Contents`

Fix clippy lints and document TaffyChildIter next method

Fix Yoga benchmark helpers

smallvec WIP

Precompute display:contents vec

Remove smallvec + lazy display:contents iter implementation

Remove TaffySliceIter

Fix clippy lints

Use RefMut::map to simplify iterator

Reuse Vec in children iterator cache

Use a struct for the children cache

Change order of find_child_recursive parameters

Add Display::Contents to the release notes

Fix child and child_count methods to use resolved children
@nicoburns
Copy link
Collaborator Author

Performance on this has unfortunately regressed after rebasing on top of latest main. Especially for CSS Grid where the wide 10k node test is taking 228.58 ms with this PR, up from around ~10ms. I think I'm going to try:

  • An algorithm-side implementation
  • To optimise the algorithms to only query each node's children once.
  • To cache for each node whether they have any display: contents children
  • And then in the case that they do, maybe caching the list of children (this ought to use too much memory if we only store it for nodes that actually have display: contents children)

In any case, I'm not considering this blocking for a 0.4 release.

@nicoburns nicoburns changed the title Implement Display::Contents Implement Display::Contents (tree-side implementation) Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Display::Contents
3 participants