Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oxidize two qubit basis decomposer #12010

Merged
merged 17 commits into from
Mar 28, 2024

Conversation

mtreinish
Copy link
Member

@mtreinish mtreinish commented Mar 14, 2024

Summary

This commit is the second part of migrating the default 2q unitary
synthesis method to leverage parallel rust as described in #8774. The
Eventual goal is to be able to run unitary synthesis in parallel for all
the unitary matrices in a single call from the UnitarySynthesis pass.
The TwoQubitBasisDecomposer class is one of the default decomposers used
by the unitary synthesis plugin. After this we can build an interface
that will run the decomposition in parallel for a given decomposer.

This commit re-implements the TwoQubitBasisDecomposer class in rust. It
keeps the same algorithm from the previous python version but implements
it in rust. This builds off of #11946 and for the operation of the
decomposer class the TwoQubitWeylDecomposition class is used solely
through rust.

This commit depends on #11946 and will need to be rebased after #11946
is merged.

Details and comments

Fixes #12004

TODO:

@mtreinish mtreinish added on hold Can not fix yet performance Changelog: New Feature Include in the "Added" section of the changelog synthesis Rust This PR or issue is related to Rust code in the repository labels Mar 14, 2024
@mtreinish mtreinish added this to the 1.1.0 milestone Mar 14, 2024
@qiskit-bot
Copy link
Collaborator

One or more of the the following people are requested to review this:

  • @Eric-Arellano
  • @Qiskit/terra-core
  • @kevinhartman
  • @levbishop
  • @mtreinish

@mtreinish mtreinish marked this pull request as draft March 14, 2024 06:54
@mtreinish
Copy link
Member Author

For some initial numbers I ran the same asv benchmarks that I did in #11946 (comment) to compare this PR against the current state of main (which doesn't have #11946 yet)

Benchmarks that have improved:

       before           after         ratio
     [230e91cb]       [627269be]
     <main>       <oxidize-two-qubit-basis-decomposer>
-     4.00±0.01ms      3.16±0.02ms     0.79  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(1, 'synthesis')
-      2.20±0.01s          1.68±0s     0.77  transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(3)
-      1.77±0.02s          1.33±0s     0.75  transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(2)
-         1.39±0s          948±3ms     0.68  transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(0)
-       335±0.9ms        228±0.8ms     0.68  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(2)
-         386±1ms        258±0.8ms     0.67  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(3)
-     8.81±0.09ms      4.72±0.02ms     0.54  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(2, 'synthesis')
-         231±1ms        121±0.5ms     0.52  transpiler_levels.TranspilerLevelBenchmarks.time_schedule_qv_14_x_14(0)
-     12.5±0.08ms      6.18±0.03ms     0.49  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(2, 'translator')
-       208±0.5ms        101±0.2ms     0.49  transpiler_levels.TranspilerLevelBenchmarks.time_schedule_qv_14_x_14(1)
-         825±6ms          390±2ms     0.47  transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(1)
-      17.4±0.4ms       8.13±0.2ms     0.47  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(3, 'translator')
-       209±0.6ms       97.1±0.6ms     0.47  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(0)
-       189±0.5ms       82.1±0.3ms     0.43  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(1)
-        19.4±4ms       8.37±0.8ms     0.43  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(3, 'synthesis')
-      50.8±0.3ms      20.2±0.09ms     0.40  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(5, 'translator')
-       151±0.4ms       54.6±0.4ms     0.36  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(8, 'translator')
-         208±6ms         73.2±4ms     0.35  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(8, 'synthesis')
-       454±0.5ms        156±0.9ms     0.34  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(14, 'translator')
-      1.66±0.01s          569±2ms     0.34  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(27, 'translator')
-       931±0.6ms        319±0.4ms     0.34  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(20, 'translator')
-        54.9±2ms         18.2±2ms     0.33  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(5, 'synthesis')
-        864±30ms         282±20ms     0.33  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(14, 'synthesis')
-      2.10±0.01s         685±10ms     0.33  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(20, 'synthesis')
-      4.31±0.03s       1.39±0.02s     0.32  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(27, 'synthesis')

Benchmarks that have stayed the same:

       before           after         ratio
     [230e91cb]       [627269be]
     <main>       <oxidize-two-qubit-basis-decomposer>
              n/a              n/a      n/a  quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(1081, 100, 'decay')
              n/a              n/a      n/a  quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(1081, 100, 'lookahead')
              n/a              n/a      n/a  quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(409, 100, 'decay')
              n/a              n/a      n/a  quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(409, 100, 'lookahead')
              n/a              n/a      n/a  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_depth_sabre_swap(1081, 100, 'decay')
              n/a              n/a      n/a  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_depth_sabre_swap(1081, 100, 'lookahead')
              n/a              n/a      n/a  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_depth_sabre_swap(409, 100, 'decay')
              n/a              n/a      n/a  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_depth_sabre_swap(409, 100, 'lookahead')
              n/a              n/a      n/a  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_size_sabre_swap(1081, 100, 'decay')
              n/a              n/a      n/a  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_size_sabre_swap(1081, 100, 'lookahead')
              n/a              n/a      n/a  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_size_sabre_swap(409, 100, 'decay')
              n/a              n/a      n/a  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_size_sabre_swap(409, 100, 'lookahead')
      2.64±0.01ms      2.70±0.03ms     1.02  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(1, 'translator')
       3.11±0.02s       3.17±0.01s     1.02  quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(1081, 10, 'decay')
          191±1ms          193±1ms     1.01  quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(115, 100, 'lookahead')
       21.4±0.2ms       21.6±0.1ms     1.01  quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(115, 10, 'lookahead')
        202±0.4ms          204±2ms     1.01  quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(115, 100, 'decay')
          182±1ms          183±1ms     1.01  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(2)
        112±0.3ms        112±0.7ms     1.01  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(1)
       1.87±0.01s          1.88±0s     1.00  quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(1081, 10, 'lookahead')
        114±0.5ms        114±0.3ms     1.00  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(1)
       23.1±0.2ms       23.2±0.1ms     1.00  quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(115, 10, 'decay')
          290±1ms          290±2ms     1.00  quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(409, 10, 'decay')
        184±0.8ms          184±2ms     1.00  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(2)
            11498            11498     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_depth_sabre_swap(1081, 10, 'decay')
            13134            13134     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_depth_sabre_swap(1081, 10, 'lookahead')
              557              557     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_depth_sabre_swap(115, 10, 'decay')
              547              547     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_depth_sabre_swap(115, 10, 'lookahead')
             5231             5231     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_depth_sabre_swap(115, 100, 'decay')
             5663             5663     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_depth_sabre_swap(115, 100, 'lookahead')
             2955             2955     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_depth_sabre_swap(409, 10, 'decay')
             3945             3945     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_depth_sabre_swap(409, 10, 'lookahead')
           217448           217448     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_size_sabre_swap(1081, 10, 'decay')
           189590           189590     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_size_sabre_swap(1081, 10, 'lookahead')
             4419             4419     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_size_sabre_swap(115, 10, 'decay')
             4351             4351     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_size_sabre_swap(115, 10, 'lookahead')
            44362            44362     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_size_sabre_swap(115, 100, 'decay')
            44711            44711     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_size_sabre_swap(115, 100, 'lookahead')
            42115            42115     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_size_sabre_swap(409, 10, 'decay')
            47894            47894     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_size_sabre_swap(409, 10, 'lookahead')
             2565             2565     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(0)
             1403             1403     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(1)
             1403             1403     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(2)
             1296             1296     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(3)
             2705             2705     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(0)
             2005             2005     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(1)
             2005             2005     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(2)
                7                7     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(3)
             2705             2705     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(0)
             2005             2005     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(1)
             2005             2005     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(2)
                7                7     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(3)
              323              323     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(0)
              336              336     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(1)
              336              336     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(2)
              272              272     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(3)
      53.5±0.07ms       53.3±0.4ms     1.00  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(0)
        300±0.8ms          298±3ms     0.99  quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(409, 10, 'lookahead')
       51.4±0.1ms       50.6±0.4ms     0.98  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(0)
       80.2±0.4ms       78.9±0.7ms     0.98  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(3)
       82.3±0.3ms       79.8±0.2ms     0.97  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(3)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

This commit is the second part of migrating the default 2q unitary
synthesis method to leverage parallel rust as described in Qiskit#8774. The
Eventual goal is to be able to run unitary synthesis in parallel for all
the unitary matrices in a single call from the `UnitarySynthesis` pass.
The TwoQubitBasisDecomposer class is one of the default decomposers used
by the unitary synthesis plugin. After this we can build an interface
that will run the decomposition in parallel for a given decomposer.

This commit re-implements the TwoQubitBasisDecomposer class in rust. It
keeps the same algorithm from the previous python version but implements
it in rust. This builds off of Qiskit#11946 and for the operation of the
decomposer class the TwoQubitWeylDecomposition class is used solely
through rust.

This commit depends on Qiskit#11946 and will need to be rebased after Qiskit#11946
is merged.

Fixes Qiskit#12004
@mtreinish mtreinish force-pushed the oxidize-two-qubit-basis-decomposer branch from 627269b to ad70122 Compare March 14, 2024 21:18
@mtreinish mtreinish changed the title [WIP] Oxidize two qubit basis decomposer Oxidize two qubit basis decomposer Mar 19, 2024
@mtreinish mtreinish removed the on hold Can not fix yet label Mar 19, 2024
@mtreinish mtreinish marked this pull request as ready for review March 19, 2024 20:26
@qiskit-bot
Copy link
Collaborator

One or more of the the following people are requested to review this:

  • @Eric-Arellano
  • @Qiskit/terra-core
  • @kevinhartman
  • @levbishop
  • @mtreinish

@coveralls
Copy link

coveralls commented Mar 19, 2024

Pull Request Test Coverage Report for Build 8425179161

Details

  • 590 of 681 (86.64%) changed or added relevant lines in 4 files are covered.
  • 20 unchanged lines in 5 files lost coverage.
  • Overall coverage decreased (-0.06%) to 89.266%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/accelerate/src/convert_2q_block_matrix.rs 3 5 60.0%
qiskit/transpiler/passes/synthesis/unitary_synthesis.py 2 6 33.33%
qiskit/synthesis/two_qubit/two_qubit_decompose.py 29 34 85.29%
crates/accelerate/src/two_qubit_decompose.rs 556 636 87.42%
Files with Coverage Reduction New Missed Lines %
qiskit/synthesis/two_qubit/two_qubit_decompose.py 1 93.61%
crates/accelerate/src/two_qubit_decompose.rs 1 89.33%
crates/qasm2/src/lex.rs 2 92.37%
crates/qasm2/src/parse.rs 6 97.61%
crates/accelerate/src/euler_one_qubit_decomposer.rs 10 89.78%
Totals Coverage Status
Change from base Build 8423693946: -0.06%
Covered Lines: 60154
Relevant Lines: 67387

💛 - Coveralls

@mtreinish
Copy link
Member Author

This should be ready to go now. There might be some more performance tuning and optimization we can do to the code (please feel free to leave any suggestions inline) but the performance is looking really good in it's current form. I reran the benchmarks from above against current main (which includes #11946 now) and it resulted in:

Benchmarks that have improved:

       before           after         ratio
     [00b0952c]       [4320a623]
     <main>       <oxidize-two-qubit-basis-decomposer>
-      1.50±0.03s       1.36±0.01s     0.91  transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(2)
-      1.02±0.02s          902±6ms     0.88  transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(0)
-         304±2ms          267±3ms     0.88  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(3)
-       154±0.3ms          121±2ms     0.78  transpiler_levels.TranspilerLevelBenchmarks.time_schedule_qv_14_x_14(0)
-       138±0.4ms          105±2ms     0.77  transpiler_levels.TranspilerLevelBenchmarks.time_schedule_qv_14_x_14(1)
-       129±0.1ms         97.7±2ms     0.76  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(0)
-         534±3ms          398±5ms     0.74  transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(1)
-       120±0.9ms       84.5±0.3ms     0.70  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(1)
-      7.78±0.5ms       4.40±0.2ms     0.57  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(2, 'synthesis')
-        14.0±3ms         7.14±1ms     0.51  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(3, 'synthesis')
-      11.5±0.3ms       5.41±0.7ms     0.47  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(2, 'translator')
-      15.7±0.4ms       6.90±0.3ms     0.44  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(3, 'translator')
-      46.4±0.1ms       16.4±0.2ms     0.35  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(5, 'translator')
-        51.9±4ms       16.9±0.7ms     0.33  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(5, 'synthesis')
-       138±0.9ms       42.5±0.5ms     0.31  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(8, 'translator')
-        717±20ms          215±7ms     0.30  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(14, 'synthesis')
-      1.49±0.01s          445±4ms     0.30  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(27, 'translator')
-         415±3ms          123±2ms     0.30  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(14, 'translator')
-         846±2ms          246±2ms     0.29  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(20, 'translator')
-        190±10ms         54.8±3ms     0.29  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(8, 'synthesis')
-      3.60±0.09s          945±7ms     0.26  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(27, 'synthesis')
-      1.84±0.03s          474±8ms     0.26  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(20, 'synthesis')

Benchmarks that have stayed the same:

       before           after         ratio
     [00b0952c]       [4320a623]
     <main>       <oxidize-two-qubit-basis-decomposer>
         80.9±1ms         82.2±2ms     1.02  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(3)
       3.19±0.01s       3.22±0.01s     1.01  quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(1081, 10, 'decay')
       1.93±0.01s       1.94±0.01s     1.00  quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(1081, 10, 'lookahead')
        115±0.5ms        115±0.9ms     1.00  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(1)
       55.0±0.1ms       55.1±0.3ms     1.00  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(0)
         52.6±1ms      52.7±0.09ms     1.00  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(0)
          181±2ms        181±0.2ms     1.00  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(2)
            11498            11498     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_depth_sabre_swap(1081, 10, 'decay')
            13134            13134     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_depth_sabre_swap(1081, 10, 'lookahead')
              557              557     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_depth_sabre_swap(115, 10, 'decay')
              547              547     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_depth_sabre_swap(115, 10, 'lookahead')
             5231             5231     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_depth_sabre_swap(115, 100, 'decay')
             5663             5663     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_depth_sabre_swap(115, 100, 'lookahead')
             2955             2955     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_depth_sabre_swap(409, 10, 'decay')
             3945             3945     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_depth_sabre_swap(409, 10, 'lookahead')
           217448           217448     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_size_sabre_swap(1081, 10, 'decay')
           189590           189590     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_size_sabre_swap(1081, 10, 'lookahead')
             4419             4419     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_size_sabre_swap(115, 10, 'decay')
             4351             4351     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_size_sabre_swap(115, 10, 'lookahead')
            44362            44362     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_size_sabre_swap(115, 100, 'decay')
            44711            44711     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_size_sabre_swap(115, 100, 'lookahead')
            42115            42115     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_size_sabre_swap(409, 10, 'decay')
            47894            47894     1.00  quantum_volume.LargeQuantumVolumeMappingTrackBench.track_size_sabre_swap(409, 10, 'lookahead')
             2565             2565     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(0)
             1403             1403     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(1)
             1403             1403     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(2)
             1296             1296     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(3)
             2705             2705     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(0)
             2005             2005     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(1)
             2005             2005     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(2)
                7                7     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(3)
             2705             2705     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(0)
             2005             2005     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(1)
             2005             2005     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(2)
                7                7     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(3)
              323              323     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(0)
              336              336     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(1)
              336              336     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(2)
              272              272     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(3)
        114±0.5ms        114±0.8ms     1.00  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(1)
       82.6±0.5ms       82.5±0.4ms     1.00  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(3)
          181±2ms        181±0.5ms     1.00  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(2)
        296±0.8ms          296±1ms     1.00  quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(409, 10, 'decay')
       2.61±0.1ms      2.60±0.05ms     1.00  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(1, 'translator')
       22.0±0.2ms       22.0±0.1ms     1.00  quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(115, 10, 'lookahead')
          200±1ms          198±1ms     0.99  quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(115, 100, 'lookahead')
          210±2ms          207±1ms     0.99  quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(115, 100, 'decay')
          316±4ms          307±2ms     0.97  quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(409, 10, 'lookahead')
       3.28±0.1ms      3.10±0.03ms     0.95  quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(1, 'synthesis')
       1.84±0.01s       1.71±0.01s     0.93  transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(3)
       25.3±0.8ms       23.5±0.1ms     0.93  quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(115, 10, 'decay')
          265±2ms          241±1ms     0.91  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(2)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

Copy link
Contributor

@kevinhartman kevinhartman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks mostly good to me, at least from a Rust perspective.

The performance improvements are fantastic! 😄

crates/accelerate/src/two_qubit_decompose.rs Show resolved Hide resolved
crates/accelerate/src/two_qubit_decompose.rs Show resolved Hide resolved
@kevinhartman kevinhartman enabled auto-merge March 25, 2024 20:53
@kevinhartman kevinhartman added this pull request to the merge queue Mar 25, 2024
@mtreinish mtreinish removed this pull request from the merge queue due to a manual request Mar 25, 2024
@mtreinish
Copy link
Member Author

As keen as I am to see this merge I've pulled it from the merge queue just to give @levbishop a chance to review it too.

Copy link
Member

@levbishop levbishop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is generally fine as a translation of the python.

I do worry that it further locks in some possibly non-optimal choices from the python. In particular forcing an explicit 1q decomposition at 2q decomposition level is probably not the best overall strategy for the future (since it will almost certainly be immediately converted back to a unitary matrix and collected, resynthesized by the 1q optimizations)

Tracking the num_basis_gates() flow through the code is a bit of a maze. Python num_basis_gates calls rust num_basis_gates callls _num_basis_gates calls __num_basis_gates

I'm interested to try removing the layer of wrapping of the rust object inside a python object, but i think we can explore that in future PRs without breaking any API or anything.

I do very much like the perf delta from this work!

@mtreinish mtreinish added this pull request to the merge queue Mar 28, 2024
@mtreinish
Copy link
Member Author

Yeah I think there is a round of clean up we'll want to do to simplify things. As you say I think that'll be fairly straightforward once this merges without any api implications. Porting the algorithm was large and kind of unwieldy that I wasn't super concerned with the exact interface thinking we could iterate on it later, especially because the rust code is explicitly private.

Merged via the queue into Qiskit:main with commit b9ee758 Mar 28, 2024
12 checks passed
@mtreinish mtreinish deleted the oxidize-two-qubit-basis-decomposer branch March 28, 2024 19:23
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Apr 5, 2024
This commit tweaks the heuristic effort in optimization level 2 to be
more of a middle ground between level 1 and 3; with a better balance
between output quality and runtime. This places it to be a better
default for a pass manager we use if one isn't specified. The
tradeoff here is that the vf2layout and vf2postlayout search space is
reduced to be the same as level 1. There are diminishing margins of
return on the vf2 layout search especially for cases when there are a
large number of qubit permutations for the mapping found. Then the
number of sabre trials is brought up to the same level as optimization
level 3. As this can have a significant impact on output and the extra
runtime cost is minimal. The larger change is that the optimization
passes from level 3. This ends up mainly being 2q peephole optimization.
With the performance improvements from Qiskit#12010 and Qiskit#11946 and all the
follow-on PRs this is now fast enough to rely on in optimization level
2.
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Apr 5, 2024
This commit tweaks the heuristic effort in optimization level 2 to be
more of a middle ground between level 1 and 3; with a better balance
between output quality and runtime. This places it to be a better
default for a pass manager we use if one isn't specified. The
tradeoff here is that the vf2layout and vf2postlayout search space is
reduced to be the same as level 1. There are diminishing margins of
return on the vf2 layout search especially for cases when there are a
large number of qubit permutations for the mapping found. Then the
number of sabre trials is brought up to the same level as optimization
level 3. As this can have a significant impact on output and the extra
runtime cost is minimal. The larger change is that the optimization
passes from level 3. This ends up mainly being 2q peephole optimization.
With the performance improvements from Qiskit#12010 and Qiskit#11946 and all the
follow-on PRs this is now fast enough to rely on in optimization level
2.
github-merge-queue bot pushed a commit that referenced this pull request Apr 23, 2024
* Increase heuristic effort for optimization level 2

This commit tweaks the heuristic effort in optimization level 2 to be
more of a middle ground between level 1 and 3; with a better balance
between output quality and runtime. This places it to be a better
default for a pass manager we use if one isn't specified. The
tradeoff here is that the vf2layout and vf2postlayout search space is
reduced to be the same as level 1. There are diminishing margins of
return on the vf2 layout search especially for cases when there are a
large number of qubit permutations for the mapping found. Then the
number of sabre trials is brought up to the same level as optimization
level 3. As this can have a significant impact on output and the extra
runtime cost is minimal. The larger change is that the optimization
passes from level 3. This ends up mainly being 2q peephole optimization.
With the performance improvements from #12010 and #11946 and all the
follow-on PRs this is now fast enough to rely on in optimization level
2.

* Add test workaround from level 3 to level 2 too

* Expand vf2 call limit on VF2Layout

For the initial VF2Layout call this commit expands the vf2 call limit
back to the previous level instead of reducing it to the same as level 1.
The idea behind making this change is that spending up to 10s to find a
perfect layout is a worthwhile tradeoff as that will greatly improve the
result from execution. But scoring multiple layouts to find the lowest
error rate subgraph has a diminishing margin of return in most cases as
there typically aren't thousands of unique subgraphs and often when we
hit the scoring limit it's just permuting the qubits inside a subgraph
which doesn't provide the most value.

For VF2PostLayout the lower call limits from level 1 is still used. This
is because both the search for isomorphic subgraphs is typically much
shorter with the vf2++ node ordering heuristic so we don't need to spend
as much time looking for alternative subgraphs.

* Move 2q peephole outside of optimization loop in O2

Due to potential instability in the 2q peephole optimization we run we
were using the `MinimumPoint` pass to provide backtracking when we reach
a local minimum. However, this pass adds a significant amount of
overhead because it deep copies the circuit at every iteration of the
optimization loop that improves the output quality. This commit tweaks
the O2 pass manager construction to only run 2q peephole once, and then
updates the optimization loop to be what the previous O2 optimization
loop was.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: New Feature Include in the "Added" section of the changelog performance Rust This PR or issue is related to Rust code in the repository synthesis
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Port TwoQubitBasisDecomposition to Rust
6 participants