-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiment to mitigate StorageId union access patterns #16939
base: main
Are you sure you want to change the base?
Experiment to mitigate StorageId union access patterns #16939
Conversation
0b90fd9
to
a4c4e9e
Compare
To make the status of this PR clear: PR feedback so far has been addressed, but work is still ongoing to check for perf regressions and address them. |
d7bec23
to
6b11229
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
I'll be surprised if the benchmarks show a slowdown from this, but I'm often surprised by the results of benchmarks :).
I ran the benchmarks for this and found some significant performance regressions. I did the following: $ git checkout main
# Build ECS benchmarks.
$ cargo build -p benches --bench ecs
# Run ECS benchmarks directly, pinning them to the first CPU. Save the results as a baseline named "main".
$ taskset --cpu-list 0 target/release/deps/ecs-5a85551b99999190 iter_ --bench --save-baseline main
# Switch to this PR's branch.
$ git switch try-mitigating-storage-id-union
# Rebuild benchmarks.
$ cargo build -p benches --bench ecs
# Run the new benchmarks, comparing the results with the saved baseline.
$ taskset --cpu-list 0 ../target/release/deps/ecs-5a85551b99999190 iter_ --baseline main --bench I found significant regressions on the following:
There were several other performance gains and regressions (between 2-7%), which I've included in the results below. I've also included the HTML report, with all of it's graphs, as a ZIP file for further analysis. Note that the two regressions above were in the microseconds and nanoseconds, so this may be negligible in an actual program. Hope this helps! I don't have enough knowledge on the ECS to figure out why the performance is regressions, but this should be a good starting point. Benchmark Outputiter_fragmented/base time: [303.60 ns 303.65 ns 303.73 ns]
change: [-25.937% -25.724% -25.539%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low mild
3 (3.00%) high mild
8 (8.00%) high severe
iter_fragmented/wide time: [3.9187 µs 3.9514 µs 3.9834 µs]
change: [+1.2996% +2.1145% +2.8990%] (p = 0.00 < 0.05)
Performance has regressed.
iter_fragmented/foreach time: [113.56 ns 118.16 ns 123.32 ns]
change: [+0.5727% +4.8012% +9.3791%] (p = 0.03 < 0.05)
Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
7 (7.00%) high mild
iter_fragmented/foreach_wide
time: [4.8465 µs 4.8818 µs 4.9181 µs]
change: [+88.946% +90.418% +91.872%] (p = 0.00 < 0.05)
Performance has regressed.
iter_fragmented_sparse/base
time: [4.6012 ns 4.6214 ns 4.6463 ns]
change: [-10.872% -9.8044% -8.9351%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) high mild
5 (5.00%) high severe
iter_fragmented_sparse/wide
time: [51.661 ns 52.295 ns 52.938 ns]
change: [+1.9106% +2.9326% +3.9490%] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
iter_fragmented_sparse/foreach
time: [5.1473 ns 5.1730 ns 5.2047 ns]
change: [-3.2390% -1.8911% -0.6794%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 19 outliers among 100 measurements (19.00%)
10 (10.00%) high mild
9 (9.00%) high severe
iter_fragmented_sparse/foreach_wide
time: [63.093 ns 64.442 ns 66.973 ns]
change: [+78.468% +81.425% +85.276%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) high mild
3 (3.00%) high severe
iter_simple/base time: [5.1535 µs 5.1571 µs 5.1626 µs]
change: [+0.2114% +0.3693% +0.5396%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
iter_simple/wide time: [35.295 µs 35.329 µs 35.368 µs]
change: [+0.9023% +1.0881% +1.2520%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
6 (6.00%) high mild
1 (1.00%) high severe
iter_simple/system time: [5.3246 µs 5.3248 µs 5.3251 µs]
change: [+2.9911% +3.2136% +3.3466%] (p = 0.00 < 0.05)
Performance has regressed.
Found 19 outliers among 100 measurements (19.00%)
2 (2.00%) low severe
7 (7.00%) low mild
7 (7.00%) high mild
3 (3.00%) high severe
iter_simple/sparse_set time: [15.684 µs 15.691 µs 15.700 µs]
change: [+4.2280% +4.4092% +4.5419%] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
iter_simple/wide_sparse_set
time: [78.096 µs 78.192 µs 78.280 µs]
change: [+0.7110% +0.8327% +0.9447%] (p = 0.00 < 0.05)
Change within noise threshold.
iter_simple/foreach time: [5.1049 µs 5.1054 µs 5.1061 µs]
change: [+2.6787% +2.8166% +2.9249%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) high mild
2 (2.00%) high severe
iter_simple/foreach_wide
time: [38.040 µs 38.061 µs 38.083 µs]
change: [+6.5819% +6.7352% +6.8683%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
6 (6.00%) high mild
1 (1.00%) high severe
iter_simple/foreach_sparse_set
time: [14.405 µs 14.420 µs 14.436 µs]
change: [+4.1729% +4.3148% +4.4363%] (p = 0.00 < 0.05)
Performance has regressed.
Found 21 outliers among 100 measurements (21.00%)
7 (7.00%) high mild
14 (14.00%) high severe
iter_simple/foreach_wide_sparse_set
time: [79.613 µs 79.629 µs 79.647 µs]
change: [+1.0849% +1.1762% +1.2634%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) low mild
1 (1.00%) high mild
2 (2.00%) high severe
iter_simple/foreach_hybrid
time: [6.9573 µs 6.9831 µs 7.0189 µs]
change: [+3.1587% +5.7356% +8.8315%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
1 (1.00%) high mild
13 (13.00%) high severe
par_iter_simple/with_0_fragment
time: [51.424 µs 51.436 µs 51.448 µs]
change: [+0.7495% +0.8385% +0.9118%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
par_iter_simple/with_10_fragment
time: [51.470 µs 51.485 µs 51.502 µs]
change: [+0.6942% +0.7896% +0.8947%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
4 (4.00%) high mild
2 (2.00%) high severe
par_iter_simple/with_100_fragment
time: [52.126 µs 52.152 µs 52.180 µs]
change: [-0.2657% -0.0798% +0.0895%] (p = 0.39 > 0.05)
No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
par_iter_simple/with_1000_fragment
time: [59.121 µs 59.224 µs 59.339 µs]
change: [-8.9503% -7.3700% -5.9287%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild
par_iter_simple/hybrid time: [144.31 µs 144.37 µs 144.42 µs]
change: [-2.7294% -1.7431% -0.6849%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 17 outliers among 100 measurements (17.00%)
10 (10.00%) low mild
3 (3.00%) high mild
4 (4.00%) high severe
iter_fragmented(4096)_empty/foreach_table
time: [1.8577 µs 1.8606 µs 1.8639 µs]
change: [-1.0602% -0.4868% -0.1011%] (p = 0.03 < 0.05)
Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
iter_fragmented(4096)_empty/foreach_sparse
time: [5.9155 µs 5.9474 µs 5.9762 µs]
change: [-1.3398% -0.8089% -0.3228%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
10 (10.00%) low mild |
6b11229
to
4d8e2f1
Compare
I think I've found the source of the performance regressions, I had made a bit of a silly mistake:
|
@omaskery I encourage you to run the benchmarks yourself once you feel this PR is ready, but let me know if you don't use Linux or need help with the instructions. :) |
@BD103 thanks, I have been running the benchmarks, and used perf diff to identify the issue I mentioned in my previous comment. I'm currently working on the feedback from @chescock - particularly how to approach the |
Latest benchmarks: criterion-report.zip Benchmark Output:iter_fragmented/base time: [506.80 ns 508.79 ns 511.09 ns]
change: [-5.5501% -5.0584% -4.5441%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
iter_fragmented/wide time: [6.4090 µs 6.4291 µs 6.4489 µs]
change: [-1.1382% -0.6173% -0.0846%] (p = 0.02 < 0.05)
Change within noise threshold.
iter_fragmented/foreach time: [215.93 ns 223.90 ns 232.65 ns]
change: [-1.1526% +1.9866% +5.0240%] (p = 0.20 > 0.05)
No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
iter_fragmented/foreach_wide
time: [4.9398 µs 4.9622 µs 4.9835 µs]
change: [-8.7052% -6.3693% -4.1158%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high severe
iter_fragmented_sparse/base
time: [8.7403 ns 8.8672 ns 8.9988 ns]
change: [-5.6032% -4.7100% -3.8123%] (p = 0.00 < 0.05)
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
9 (9.00%) high mild
2 (2.00%) high severe
iter_fragmented_sparse/wide
time: [68.481 ns 69.769 ns 71.290 ns]
change: [+18.747% +21.111% +23.546%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
8 (8.00%) high mild
4 (4.00%) high severe
iter_fragmented_sparse/foreach
time: [10.037 ns 10.083 ns 10.134 ns]
change: [-14.318% -11.335% -8.2538%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
iter_fragmented_sparse/foreach_wide
time: [52.544 ns 52.876 ns 53.232 ns]
change: [+25.600% +26.624% +27.616%] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
iter_simple/base time: [10.437 µs 10.653 µs 10.924 µs]
change: [+2.4153% +4.5453% +6.9547%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
5 (5.00%) high mild
9 (9.00%) high severe
iter_simple/wide time: [52.925 µs 53.135 µs 53.349 µs]
change: [-1.3019% +0.4637% +1.8934%] (p = 0.62 > 0.05)
No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high mild
iter_simple/system time: [10.234 µs 10.255 µs 10.279 µs]
change: [-0.0113% +0.5585% +1.3869%] (p = 0.10 > 0.05)
No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) high mild
2 (2.00%) high severe
iter_simple/sparse_set time: [23.959 µs 24.060 µs 24.196 µs]
change: [-8.5315% -6.2355% -3.6832%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) high mild
7 (7.00%) high severe
iter_simple/wide_sparse_set
time: [127.20 µs 127.49 µs 127.80 µs]
change: [+4.6723% +5.2593% +5.8406%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) low mild
5 (5.00%) high mild
1 (1.00%) high severe
iter_simple/foreach time: [10.355 µs 10.458 µs 10.573 µs]
change: [-6.1792% -4.5931% -3.2888%] (p = 0.00 < 0.05)
Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
8 (8.00%) high mild
5 (5.00%) high severe
iter_simple/foreach_wide
time: [57.668 µs 57.772 µs 57.887 µs]
change: [-7.1458% -5.4034% -3.7904%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
7 (7.00%) low mild
6 (6.00%) high mild
1 (1.00%) high severe
iter_simple/foreach_sparse_set
time: [22.267 µs 22.311 µs 22.360 µs]
change: [+0.1102% +0.4825% +0.8512%] (p = 0.01 < 0.05)
Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) high mild
2 (2.00%) high severe
iter_simple/foreach_wide_sparse_set
time: [133.41 µs 137.98 µs 143.67 µs]
change: [+7.4944% +9.8618% +12.698%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
3 (3.00%) high mild
9 (9.00%) high severe
iter_simple/foreach_hybrid
time: [14.422 µs 14.449 µs 14.479 µs]
change: [-13.412% -9.2318% -5.1831%] (p = 0.00 < 0.05)
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
7 (7.00%) high mild
3 (3.00%) high severe
par_iter_simple/with_0_fragment
time: [47.125 µs 47.362 µs 47.598 µs]
change: [-10.509% -8.5731% -6.9490%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe
par_iter_simple/with_10_fragment
time: [48.112 µs 48.709 µs 49.548 µs]
change: [-9.1211% -6.4227% -3.2409%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) high mild
2 (2.00%) high severe
par_iter_simple/with_100_fragment
time: [48.288 µs 48.659 µs 49.032 µs]
change: [-9.2277% -7.8553% -6.5745%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
par_iter_simple/with_1000_fragment
time: [60.481 µs 61.683 µs 63.076 µs]
change: [-1.8422% +0.0320% +1.9742%] (p = 0.97 > 0.05)
No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
9 (9.00%) high mild
6 (6.00%) high severe
par_iter_simple/hybrid time: [90.960 µs 91.182 µs 91.418 µs]
change: [-11.332% -7.9212% -4.7275%] (p = 0.00 < 0.05)
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) high mild
8 (8.00%) high severe
iter_fragmented(4096)_empty/foreach_table
time: [6.0752 µs 6.0975 µs 6.1199 µs]
change: [+3.5293% +5.8784% +8.0996%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
7 (7.00%) low severe
1 (1.00%) low mild
4 (4.00%) high mild
2 (2.00%) high severe
iter_fragmented(4096)_empty/foreach_sparse
time: [17.890 µs 17.995 µs 18.114 µs]
change: [+0.3374% +1.3626% +2.3091%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
7 (7.00%) high mild
5 (5.00%) high severe |
Whilst I think this PR sort-of achieves its original goal, and has left performance in a decent place, I'm concerned that it's not a very elegant PR. It's not clear to me whether this PR moves the code objectively toward a better place, or not. It feels like there is a better solution, perhaps already planned in other contributors' heads. Perhaps something to do with having traits over storage that allow for generic iteration of entities - rather than having that concern spread infectiously through the calling code with branching on the iteration type in various places, or near-duplication of code to support the different iteration types. All of this is to say: I won't be at all upset if people think this PR is not the right move, and I am happy to close it if people feel that way :) |
Yeah, this PR wound up a little larger than I expected. I was imagining we'd be able to change The current code is definitely confusing, though! And we just learned where a bunch of performance pitfalls are! So I'm hoping we can capture some of that knowledge somehow. In particular, it seems that iteration is only fast because the optimizer figures out that it can hoist one of the I'm going to mull it over for a bit, and try to give a review with fresh eyes next week. |
Objective
StorageId
- a union ofArchetypeId
andTableId
QueryIterationCursor
- multiple fields are only valid, and some types vary, depending on a boolean value:QueryState
- similar to above:enum
of some kind on Discord.is_dense
used to be derived from a const generic or similar), and my proposed enum refactoring was worth considering.Solution
is_dense
before, we now do a single match on the enum.StorageId
type and made it a normalenum
rather thanunion
. I have then tried to only use this in places where the downsides are hopefully minimal, such as:is_dense
anyway)fold_over_storage_range
, but I'm open to people giving advice on how to proceed there, whether I should be braver (or not).ArchetypeId
orTableId
viadebug_checked_as_x()
unsafely, to still preserve the idea that when theQueryIter
knows whether the iteration is dense or not, it doesn't want to pay the cost of checking that again when being asked to iterate storage by aStorageId
. The old code didn't pay that cost, so I was wary of doing so. If people feel that this is overly cautious perf-wise, then I can make it safe and have it fail in some way (or whatever you suggest).Testing
bevy_ecs
unit tests, but don't know what else to do. Please advise! I'm also asking in #bevy_ecs on discord.cargo miri test -p bevy_ecs
and found no issues that weren't already present onmain
(there are ~7 memory leaks reported on both main and my branch!).