Resolve aliases before checking for unique values (bytecodealliance#4966

) At control-flow join points, cranelift-frontend's SSA builder currently checks to see if only one definition of a variable reaches the current block. If so, it can eliminate the corresponding block parameter and use the original def directly. It implements this by turning the block parameter into an alias for the original value. However, it didn't resolve aliases during this check, except after it had already determined that there was only one definition. Resolving aliases first instead allows it to detect that more block parameters are redundant. And as more block parameters get converted to aliases, later blocks can see common definitions from further away, so this has a compounding effect. This also merges a special case, where there's exactly one unique non-sentinel definition but it's actually an alias for the sentinel, into the general case where all definitions are from the sentinel. As a result there's only one case that has to introduce a definition of the variable to zero. According to `valgrind --tool=dhat`, this is a significant memory savings. On the pulldown-cmark benchmark from Sightglass: - 15.3% (1.9MiB) less memory allocated at maximum heap - 4.1% (6.7MiB) less memory allocated in total - 9.8% (57MiB) fewer bytes read - 12.6% (36MiB) fewer bytes written - 5.4% fewer instructions retired - 1.04x faster by instructions retired (per Sightglass/perf) - 1.03x to 1.04x faster by CPU cycles (per Sightglass/perf) - 1.03 ± 0.01 times faster by CPU time (per hyperfine) - 1.04x faster by cache accesses (per Sightglass/perf) On the bz2 benchmark: - 1.06x faster by instructions retired (per Sightglass/perf) - 1.05x faster by CPU cycles (per Sightglass/perf) - 1.04 ± 0.01 times faster by CPU time (per hyperfine) - 1.02x to 1.03x faster by cache accesses (per Sightglass/perf) Even on the largest benchmark in Sightglass (spidermonkey.wasm), this is a measurable improvement: - 1.03x faster by instructions retired (per Sightglass/perf) - 1.02x faster by CPU cycles (per Sightglass/perf) - 1.02 ± 0.00 times faster by CPU time (per hyperfine) There was no significant difference in cache misses for any benchmark, according to Sightglass/perf.
gurry · Sep 27, 2022 · 9715d91 · 9715d91
1 parent 29c7de7
commit 9715d91
Showing 1 changed file with 15 additions and 14 deletions.
diff --git a/cranelift/frontend/src/ssa.rs b/cranelift/frontend/src/ssa.rs
@@ -600,9 +600,20 @@ impl SSABuilder {
     ) {
         let mut pred_values: ZeroOneOrMore<Value> = ZeroOneOrMore::Zero;
 
-        // Determine how many predecessors are yielding unique, non-temporary Values.
+        // Determine how many predecessors are yielding unique, non-temporary Values. If a variable
+        // is live and unmodified across several control-flow join points, earlier blocks will
+        // introduce aliases for that variable's definition, so we resolve aliases eagerly here to
+        // ensure that we can tell when the same definition has reached this block via multiple
+        // paths. Doing so also detects cyclic references to the sentinel, which can occur in
+        // unreachable code.
         let num_predecessors = self.predecessors(dest_block).len();
-        for &pred_val in self.results.iter().rev().take(num_predecessors) {
+        for pred_val in self
+            .results
+            .iter()
+            .rev()
+            .take(num_predecessors)
+            .map(|&val| func.dfg.resolve_aliases(val))
+        {
             match pred_values {
                 ZeroOneOrMore::Zero => {
                     if pred_val != sentinel {
@@ -648,19 +659,9 @@ impl SSABuilder {
                 // so we don't need to have it as a block argument.
                 // We need to replace all the occurrences of val with pred_val but since
                 // we can't afford a re-writing pass right now we just declare an alias.
-                // Resolve aliases eagerly so that we can check for cyclic aliasing,
-                // which can occur in unreachable code.
-                let mut resolved = func.dfg.resolve_aliases(pred_val);
-                if sentinel == resolved {
-                    // Cycle detected. Break it by creating a zero value.
-                    resolved = emit_zero(
-                        func.dfg.value_type(sentinel),
-                        FuncCursor::new(func).at_first_insertion_point(dest_block),
-                    );
-                }
                 func.dfg.remove_block_param(sentinel);
-                func.dfg.change_to_alias(sentinel, resolved);
-                resolved
+                func.dfg.change_to_alias(sentinel, pred_val);
+                pred_val
             }
             ZeroOneOrMore::More => {
                 // There is disagreement in the predecessors on which value to use so we have