Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use parallel-hashmap #2462

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

atetubou
Copy link
Contributor

@atetubou atetubou commented Jun 13, 2024

This makes noop build time around 12% faster in chromium.

TODO: get benchmark resulsts from hyperfine

@atetubou atetubou force-pushed the use_parallel_hashmap branch 6 times, most recently from 46ee7de to 0cc8c3b Compare June 13, 2024 10:33
@atetubou atetubou force-pushed the use_parallel_hashmap branch from 0cc8c3b to ef2fc83 Compare June 13, 2024 10:39
@digit-google
Copy link
Contributor

Interesting, can you make this a build configuration option instead?

Since this is a header-only library, just passing the path to its include directory should be sufficient. Either as a CMAKE custom variable (e.g. NINJA_PHASHMAP_INCLUDE_DIR) or a configure.py option such as --phashmap-include-dir. Both of which could set a macro used in src/hash_map.h to select the implementation.

@jhasse
Copy link
Collaborator

jhasse commented Jun 14, 2024

How does this compare against std::flat_map?

I would like to avoid adding a Git submodule since it breaks tarball downloads and is annoying in general. I think a FetchContent download would be better.

I also wouldn't like a build configuration option, because it creates another code variant to test.

@digit-google
Copy link
Contributor

Point taken. I would still welcome a configuration option that allows us to pass the path of the phashmap include directory, and thus avoid a network download, as this will greatly simplify our CI builder configurations (just like with do for GoogleTest currently :-)). How do you plan to implement FetchContent in configure.py though?

Regarding std::flat_map, this is implemented as a linear vector of (key,value) pairs, and has O(N) insertion costs. Its typically good when the number of items is small (e.g. < 1000) or when all insertions are performed initially (e.g. pass an unordered vector to the constructor, which just performs an in-place sort).

For Ninja, we have build plans with 900,000+ nodes in them, and the map are mutated in several places that can be distant in time (e.g. when dynamic dependencies are injected in it), so I don't think this type is a good fit here. Plus we don't need the ordering guarantee and the associated cost.

@atetubou
Copy link
Contributor Author

Hmm, for source code tarball we may do something like this with another tarball containing submodule too.

We can still use FetchContent and have similar functionality in configure.py, but showing caution in release page might be simpler?

@digit-google
Copy link
Contributor

I suggest keeping things simple here:

  1. If Git submodules are not appropriate, just copy the phashmap sources from a known revision under src/phashmap/ or third_party/phashmap (your choice) with the appropriate LICENSE file, and a README.md indicating the upstream URL and gut commit used to populate it. This is after all equivalent to a git checkout with one submodule (that doesn't have its own dependencies).

  2. If you don't want to do that, add an option to pass the include dir of the library at configure time for both CMake and configure.py, because I guarantee you that FetchContent / whatever download-at-configure-time solution is going to break some workflows somewhere if you don't provide an escape hatch for it.

atetubou added a commit to atetubou/ninja that referenced this pull request Jul 3, 2024
This implements option 1 in ninja-build#2462 (comment).

This uses https://github.com/greg7mdp/parallel-hashmap instead of unordered_map for faster ninja execution.
Some more context is in https://crbug.com/346910589.

This makes noop build for chromium
* 12% faster on Linux
* 2% slower on Windows
* 96% faster on macOS

```
$ hyperfine --warmup 1 \
   '~/ninja_master -C out/non_component build:buildflag_header_h' \
   '~/ninja_faster_hash -C out/non_component build:buildflag_header_h'
Benchmark 1: ~/ninja_master -C out/non_component build:buildflag_header_h
  Time (mean ± σ):      5.251 s ±  0.027 s    [User: 4.641 s, System: 0.608 s]
  Range (min … max):    5.222 s …  5.309 s    10 runs

Benchmark 2: ~/ninja_faster_hash -C out/non_component build:buildflag_header_h
  Time (mean ± σ):      4.689 s ±  0.022 s    [User: 4.055 s, System: 0.633 s]
  Range (min … max):    4.648 s …  4.737 s    10 runs

Summary
  ~/ninja_faster_hash -C out/non_component build:buildflag_header_h ran
    1.12 ± 0.01 times faster than ~/ninja_master -C out/non_component build:buildflag_header_h
```

```
Benchmark 1: C:\Users\tikuta\source\repos\ninja-build\ninja\ninja_master -C out/non_component build:buildflag_header_h
  Time (mean ± σ):     13.161 s ±  0.105 s    [User: 10.657 s, System: 2.194 s]
  Range (min … max):   13.005 s … 13.328 s    10 runs

Benchmark 2: C:\Users\tikuta\source\repos\ninja-build\ninja\ninja_faster_hash -C out/non_component build:buildflag_header_h
  Time (mean ± σ):     13.206 s ±  0.210 s    [User: 10.653 s, System: 2.194 s]
  Range (min … max):   12.951 s … 13.578 s    10 runs

Summary
  C:\Users\tikuta\source\repos\ninja-build\ninja\ninja_master -C out/non_component build:buildflag_header_h ran
    1.00 ± 0.02 times faster than C:\Users\tikuta\source\repos\ninja-build\ninja\ninja_faster_hash -C out/non_component build:buildflag_header_h
```

```
$ hyperfine --warmup 1 '~/ninja_master -C out/non_component build:buildflag_header_h' '~/ninja_faster_hash -C out/non_component build:buildflag_header_h'
Benchmark 1: ~/ninja_master -C out/non_component build:buildflag_header_h
  Time (mean ± σ):     14.360 s ±  0.706 s    [User: 9.006 s, System: 1.681 s]
  Range (min … max):   12.832 s … 15.297 s    10 runs

Benchmark 2: ~/ninja_faster_hash -C out/non_component build:buildflag_header_h
  Time (mean ± σ):      7.340 s ±  0.260 s    [User: 6.263 s, System: 0.828 s]
  Range (min … max):    6.948 s …  7.799 s    10 runs

Summary
  ~/ninja_faster_hash -C out/non_component build:buildflag_header_h ran
    1.96 ± 0.12 times faster than ~/ninja_master -C out/non_component build:buildflag_header_h
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants