Oxidize the internals of Optimize1qGatesDecomposition (Qiskit#9578)

* Oxidize the internals of Optimize1qGatesDecomposition This commit rewrites the internals of the Optimize1qGatesDecomposition transpiler pass to leverage more Rust. As the size of circuits are growing the amount of time the transpiler typically spends in Optimize1qGatesDecomposition grows linearly with the number of circuits. Since Qiskit#9185 (which converted the angle calculation in the synthesis routine to Rust) the time spent constructing intermediate DAGCircuit objects for each possible decomposition has been dominating the total runtime of the pass. To attempt to alleviate this bottleneck this commit mvoes as much of the circuit construction to rust as possible. The one qubit euler decomposition is now done in Rust and a sequence of gate names along with their corresponding angles are returned to the pass with the lowest error rate is returned. The pass will then convert that sequence into a DAGCircuit object if the decomposition will be substituted into the output dag. This has the advantage of both speeding up the computation of the output circuit and also deferring the creation of DAGCircuit and Gate objects until they're actually needed. * Move all error calculation to rust This commit makes 2 changes to the split between python and rust in the transpiler pass code. First the error mapping is converted to a rust native pyclass that increases the efficiency of getting the error rates for gates into the rust side. The second is any intermediate error scoring is done in rust. This is primarily to simplify the code as we're already doing the calculation with the new class in Rust. * Remove parallel iteration over multiple target basis This commit removes the usage of rayon for parallel iteration over the multiple target basis. In local benchmarking the overhead of using rayon to spawn a threadpool and divide the work over multiple threads hurts performance. The execution of the decomposition is sufficiently fast that iterating serially will be faster than spawning the threadpool for basis from current backends. So it's better to just remove the overhead. We can revisit parallelism in the future if it makes sense * Fix small oversights in internal pass usage This commit fixes the majority (if not all) the test failures that occured in earlier test failures. The primary cause of the failures were places that were calling private functions of the Optimize1qGatesDecomposition pass internally and not accounting for the new return types from some of those methods. This has been updated to handle these edge cases correctly now. Additionally, there was a small oversight in the porting of the numerics for the psx basis circuit generator function which was causing incorrect decompositions in some cases that has been fixed (the missing abs() call was added). * Add release note * Simplify logic to construct error map Co-authored-by: John Lapeyre <[email protected]> * Update comments, docstrings, and variable names in optimize_1q_decomposition * Make constant list of valid bases more local * Remove clippy unwrap suppression and use match syntax * Update releasenotes/notes/speedup-one-qubit-optimize-pass-483429af948a415e.yaml Co-authored-by: Jake Lishman <[email protected]> * Use to_object() instead of clone().into_py() * Remove out of date comment * Use FnOnce for X type in circuit_psx_gen * Add rem_euclid comment * Fix u3/u321 condition in _possible_decomposers --------- Co-authored-by: John Lapeyre <[email protected]> Co-authored-by: Jake Lishman <[email protected]> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
king-p3nguin · May 22, 2023 · b555e07 · b555e07
1 parent 90a5e7c
commit b555e07
Show file tree

Hide file tree

Showing 6 changed files with 786 additions and 96 deletions.
diff --git a/qiskit/transpiler/passes/optimization/optimize_1q_commutation.py b/qiskit/transpiler/passes/optimization/optimize_1q_commutation.py
@@ -165,7 +165,9 @@ def _resynthesize(self, run, qubit):
         operator = run[0].op.to_matrix()
         for gate in run[1:]:
             operator = gate.op.to_matrix().dot(operator)
-        return self._optimize1q._resynthesize_run(operator, qubit)
+        return self._optimize1q._gate_sequence_to_dag(
+            self._optimize1q._resynthesize_run(operator, qubit)
+        )
 
     @staticmethod
     def _replace_subdag(dag, old_run, new_dag):

diff --git a/qiskit/transpiler/passes/optimization/optimize_1q_decomposition.py b/qiskit/transpiler/passes/optimization/optimize_1q_decomposition.py
@@ -13,15 +13,48 @@
 """Optimize chains of single-qubit gates using Euler 1q decomposer"""
 
 import logging
-from functools import partial
-import numpy as np
+import math
 
 from qiskit.transpiler.basepasses import TransformationPass
 from qiskit.transpiler.passes.utils import control_flow
 from qiskit.quantum_info.synthesis import one_qubit_decompose
+from qiskit._accelerate import euler_one_qubit_decomposer
+from qiskit.circuit.library.standard_gates import (
+    UGate,
+    PhaseGate,
+    U3Gate,
+    U2Gate,
+    U1Gate,
+    RXGate,
+    RYGate,
+    RZGate,
+    RGate,
+    SXGate,
+    XGate,
+)
+from qiskit.circuit import Qubit
+from qiskit.dagcircuit.dagcircuit import DAGCircuit
+
 
 logger = logging.getLogger(__name__)
 
+# When expanding the list of supported gates this needs to updated in
+# lockstep with the VALID_BASES constant in src/euler_one_qubit_decomposer.rs
+# and the global variables in qiskit/quantum_info/synthesis/one_qubit_decompose.py
+NAME_MAP = {
+    "u": UGate,
+    "u1": U1Gate,
+    "u2": U2Gate,
+    "u3": U3Gate,
+    "p": PhaseGate,
+    "rx": RXGate,
+    "ry": RYGate,
+    "rz": RZGate,
+    "r": RGate,
+    "sx": SXGate,
+    "x": XGate,
+}
+
 
 class Optimize1qGatesDecomposition(TransformationPass):
     """Optimize chains of single-qubit gates by combining them into a single gate.
@@ -58,6 +91,23 @@ def __init__(self, basis=None, target=None):
             self._global_decomposers = _possible_decomposers(None)
             self._basis_gates = None
 
+        self.error_map = self._build_error_map()
+
+    def _build_error_map(self):
+        if self._target is not None:
+            error_map = euler_one_qubit_decomposer.OneQubitGateErrorMap(self._target.num_qubits)
+            for qubit in range(self._target.num_qubits):
+                gate_error = {}
+                for gate, gate_props in self._target.items():
+                    if gate_props is not None:
+                        props = gate_props.get((qubit,), None)
+                        if props is not None and props.error is not None:
+                            gate_error[gate] = props.error
+                error_map.add_qubit(gate_error)
+            return error_map
+        else:
+            return None
+
     def _resynthesize_run(self, matrix, qubit=None):
         """
         Resynthesizes one 2x2 `matrix`, typically extracted via `dag.collect_1q_runs`.
@@ -81,13 +131,23 @@ def _resynthesize_run(self, matrix, qubit=None):
                 self._local_decomposers_cache[qubits_tuple] = decomposers
         else:
             decomposers = self._global_decomposers
+        best_synth_circuit = euler_one_qubit_decomposer.unitary_to_gate_sequence(
+            matrix,
+            decomposers,
+            qubit,
+            self.error_map,
+        )
+        return best_synth_circuit
 
-        new_circs = [decomposer._decompose(matrix) for decomposer in decomposers]
+    def _gate_sequence_to_dag(self, best_synth_circuit):
+        qubits = [Qubit()]
+        out_dag = DAGCircuit()
+        out_dag.add_qubits(qubits)
+        out_dag.global_phase = best_synth_circuit.global_phase
 
-        if len(new_circs) == 0:
-            return None
-        else:
-            return min(new_circs, key=partial(_error, target=self._target, qubit=qubit))
+        for gate_name, angles in best_synth_circuit:
+            out_dag.apply_operation_back(NAME_MAP[gate_name](*angles), qubits)
+        return out_dag
 
     def _substitution_checks(self, dag, old_run, new_circ, basis, qubit):
         """
@@ -115,11 +175,8 @@ def _substitution_checks(self, dag, old_run, new_circ, basis, qubit):
         #    then we _try_ to decompose, using the results if we see improvement.
         return (
             uncalibrated_and_not_basis_p
-            or (
-                uncalibrated_p
-                and _error(new_circ, self._target, qubit) < _error(old_run, self._target, qubit)
-            )
-            or np.isclose(_error(new_circ, self._target, qubit), 0)
+            or (uncalibrated_p and self._error(new_circ, qubit) < self._error(old_run, qubit))
+            or math.isclose(self._error(new_circ, qubit)[0], 0)
         )
 
     @control_flow.trivial_recurse
@@ -139,70 +196,53 @@ def run(self, dag):
             operator = run[0].op.to_matrix()
             for node in run[1:]:
                 operator = node.op.to_matrix().dot(operator)
-            new_dag = self._resynthesize_run(operator, qubit)
+            best_circuit_sequence = self._resynthesize_run(operator, qubit)
 
             if self._target is None:
                 basis = self._basis_gates
             else:
                 basis = self._target.operation_names_for_qargs((qubit,))
 
-            if new_dag is not None and self._substitution_checks(dag, run, new_dag, basis, qubit):
+            if best_circuit_sequence is not None and self._substitution_checks(
+                dag, run, best_circuit_sequence, basis, qubit
+            ):
+                new_dag = self._gate_sequence_to_dag(best_circuit_sequence)
                 dag.substitute_node_with_dag(run[0], new_dag)
                 # Delete the other nodes in the run
                 for current_node in run[1:]:
                     dag.remove_op_node(current_node)
 
         return dag
 
+    def _error(self, circuit, qubit):
+        """
+        Calculate a rough error for a `circuit` that runs on a specific
+        `qubit` of `target` (`circuit` can either be an OneQubitGateSequence
+        from Rust or a list of DAGOPNodes).
+
+        Use basis errors from target if available, otherwise use length
+        of circuit as a weak proxy for error.
+        """
+        if isinstance(circuit, euler_one_qubit_decomposer.OneQubitGateSequence):
+            return euler_one_qubit_decomposer.compute_error_one_qubit_sequence(
+                circuit, qubit, self.error_map
+            )
+        else:
+            circuit_list = [(x.op.name, []) for x in circuit]
+            return euler_one_qubit_decomposer.compute_error_list(
+                circuit_list, qubit, self.error_map
+            )
+
 
 def _possible_decomposers(basis_set):
     decomposers = []
     if basis_set is None:
-        decomposers = [
-            one_qubit_decompose.OneQubitEulerDecomposer(basis, use_dag=True)
-            for basis in one_qubit_decompose.ONE_QUBIT_EULER_BASIS_GATES
-        ]
+        decomposers = list(one_qubit_decompose.ONE_QUBIT_EULER_BASIS_GATES)
     else:
         euler_basis_gates = one_qubit_decompose.ONE_QUBIT_EULER_BASIS_GATES
         for euler_basis_name, gates in euler_basis_gates.items():
             if set(gates).issubset(basis_set):
-                decomposer = one_qubit_decompose.OneQubitEulerDecomposer(
-                    euler_basis_name, use_dag=True
-                )
-                decomposers.append(decomposer)
+                decomposers.append(euler_basis_name)
+        if "U3" in decomposers and "U321" in decomposers:
+            decomposers.remove("U3")
     return decomposers
-
-
-def _error(circuit, target=None, qubit=None):
-    """
-    Calculate a rough error for a `circuit` that runs on a specific
-    `qubit` of `target` (circuit could also be a list of DAGNodes)
-
-    Use basis errors from target if available, otherwise use length
-    of circuit as a weak proxy for error.
-    """
-    if target is None:
-        if isinstance(circuit, list):
-            return len(circuit)
-        else:
-            return len(circuit._multi_graph) - 2
-    else:
-        if isinstance(circuit, list):
-            gate_fidelities = [
-                1 - getattr(target[node.name].get((qubit,)), "error", 0.0) for node in circuit
-            ]
-        else:
-            gate_fidelities = [
-                1 - getattr(target[inst.op.name].get((qubit,)), "error", 0.0)
-                for inst in circuit.op_nodes()
-            ]
-        gate_error = 1 - np.product(gate_fidelities)
-        if gate_error == 0.0:
-            if isinstance(circuit, list):
-                return -100 + len(circuit)
-            else:
-                return -100 + len(
-                    circuit._multi_graph
-                )  # prefer shorter circuits among those with zero error
-        else:
-            return gate_error
diff --git a/qiskit/transpiler/passes/synthesis/unitary_synthesis.py b/qiskit/transpiler/passes/synthesis/unitary_synthesis.py
@@ -725,7 +725,9 @@ def run(self, unitary, **options):
 
         if unitary.shape == (2, 2):
             _decomposer1q = Optimize1qGatesDecomposition(basis_gates, target)
-            return _decomposer1q._resynthesize_run(unitary, qubits[0])  # already in dag format
+            return _decomposer1q._gate_sequence_to_dag(
+                _decomposer1q._resynthesize_run(unitary, qubits[0])
+            )
         elif unitary.shape == (4, 4):
             # select synthesizers that can lower to the target
             if target is not None:

diff --git a/releasenotes/notes/speedup-one-qubit-optimize-pass-483429af948a415e.yaml b/releasenotes/notes/speedup-one-qubit-optimize-pass-483429af948a415e.yaml
@@ -0,0 +1,10 @@
+---
+features:
+  - |
+    The runtime performance of the :class:`~.Optimize1qGatesDecomposition`
+    transpiler pass has been significantly improved. This was done by both
+    rewriting all the computation for the pass in Rust and also decreasing
+    the amount of intermediate objects created as part of the pass's
+    execution. This should also correspond to a similar improvement
+    in the runtime performance of :func:`~.transpile` with the
+    ``optimization_level`` keyword argument set to ``1``, ``2``, or ``3``.