Skip to content

Commit

Permalink
Resolve aliases before checking for unique values (bytecodealliance#4966
Browse files Browse the repository at this point in the history
)

At control-flow join points, cranelift-frontend's SSA builder currently
checks to see if only one definition of a variable reaches the current
block. If so, it can eliminate the corresponding block parameter and use
the original def directly. It implements this by turning the block
parameter into an alias for the original value.

However, it didn't resolve aliases during this check, except after it
had already determined that there was only one definition.

Resolving aliases first instead allows it to detect that more block
parameters are redundant. And as more block parameters get converted to
aliases, later blocks can see common definitions from further away, so
this has a compounding effect.

This also merges a special case, where there's exactly one unique
non-sentinel definition but it's actually an alias for the sentinel,
into the general case where all definitions are from the sentinel. As a
result there's only one case that has to introduce a definition of the
variable to zero.

According to `valgrind --tool=dhat`, this is a significant memory
savings. On the pulldown-cmark benchmark from Sightglass:

- 15.3% (1.9MiB) less memory allocated at maximum heap
- 4.1% (6.7MiB) less memory allocated in total
- 9.8% (57MiB) fewer bytes read
- 12.6% (36MiB) fewer bytes written
- 5.4% fewer instructions retired
- 1.04x faster by instructions retired (per Sightglass/perf)
- 1.03x to 1.04x faster by CPU cycles (per Sightglass/perf)
- 1.03 ± 0.01 times faster by CPU time (per hyperfine)
- 1.04x faster by cache accesses (per Sightglass/perf)

On the bz2 benchmark:

- 1.06x faster by instructions retired (per Sightglass/perf)
- 1.05x faster by CPU cycles (per Sightglass/perf)
- 1.04 ± 0.01 times faster by CPU time (per hyperfine)
- 1.02x to 1.03x faster by cache accesses (per Sightglass/perf)

Even on the largest benchmark in Sightglass (spidermonkey.wasm), this is
a measurable improvement:

- 1.03x faster by instructions retired (per Sightglass/perf)
- 1.02x faster by CPU cycles (per Sightglass/perf)
- 1.02 ± 0.00 times faster by CPU time (per hyperfine)

There was no significant difference in cache misses for any benchmark,
according to Sightglass/perf.
  • Loading branch information
jameysharp authored Sep 27, 2022
1 parent 29c7de7 commit 9715d91
Showing 1 changed file with 15 additions and 14 deletions.
29 changes: 15 additions & 14 deletions cranelift/frontend/src/ssa.rs
Original file line number Diff line number Diff line change
Expand Up @@ -600,9 +600,20 @@ impl SSABuilder {
) {
let mut pred_values: ZeroOneOrMore<Value> = ZeroOneOrMore::Zero;

// Determine how many predecessors are yielding unique, non-temporary Values.
// Determine how many predecessors are yielding unique, non-temporary Values. If a variable
// is live and unmodified across several control-flow join points, earlier blocks will
// introduce aliases for that variable's definition, so we resolve aliases eagerly here to
// ensure that we can tell when the same definition has reached this block via multiple
// paths. Doing so also detects cyclic references to the sentinel, which can occur in
// unreachable code.
let num_predecessors = self.predecessors(dest_block).len();
for &pred_val in self.results.iter().rev().take(num_predecessors) {
for pred_val in self
.results
.iter()
.rev()
.take(num_predecessors)
.map(|&val| func.dfg.resolve_aliases(val))
{
match pred_values {
ZeroOneOrMore::Zero => {
if pred_val != sentinel {
Expand Down Expand Up @@ -648,19 +659,9 @@ impl SSABuilder {
// so we don't need to have it as a block argument.
// We need to replace all the occurrences of val with pred_val but since
// we can't afford a re-writing pass right now we just declare an alias.
// Resolve aliases eagerly so that we can check for cyclic aliasing,
// which can occur in unreachable code.
let mut resolved = func.dfg.resolve_aliases(pred_val);
if sentinel == resolved {
// Cycle detected. Break it by creating a zero value.
resolved = emit_zero(
func.dfg.value_type(sentinel),
FuncCursor::new(func).at_first_insertion_point(dest_block),
);
}
func.dfg.remove_block_param(sentinel);
func.dfg.change_to_alias(sentinel, resolved);
resolved
func.dfg.change_to_alias(sentinel, pred_val);
pred_val
}
ZeroOneOrMore::More => {
// There is disagreement in the predecessors on which value to use so we have
Expand Down

0 comments on commit 9715d91

Please sign in to comment.