Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster random circuits #464

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Faster random circuits #464

wants to merge 2 commits into from

Conversation

J-C-Q
Copy link
Contributor

@J-C-Q J-C-Q commented Jan 20, 2025

Motivation

Motivated by random circuits, generating a random Clifford operation can be a time-critical function. When trying to implement such circuits using QuantumClifford.jl I noticed many allocations from the random_clifford function. So I tried to implement a version of the random_clifford function that reuses memory, so that repeated generation of random gates doesn't allocate that much.

Implementation

This PR implements a RandDestabMemory type which stores all the memory needed in the already existing random_destabilizer algorithm. An instance of this type can be passed to the random_destabilizer function to reuse the same memory. The algorithm is should be unchanged; however, this PR changes the code in some places to avoid allocations. Some of the functionality gets separated into helper functions, but some loops persist in the main function, which arguably hurts readability. To avoid as many allocations as possible, this PR also rewrites a few other functions like quantum_mallows to reuse memory. Most notably, the calculation of the inverse of delta' and delta_p' doesn't rely on external packages anymore and makes use of the respective structure of the matrices, which increases performance.
This PR also implements this new version of random_destabilizer in the random_brickwork_clifford_circuit function.

TODO

There is still some cleanup to be done. As well as documentation and testing. However, I thought I'd share this idea already to get feedback if you are even interested in integrating this. I'm very open to any criticism and ideas to improve my code, as well as happy to work on this more.

Benchmark

Attached you can find some rudimentary benchmarks of the random_brickwork_clifford_circuit function, first without and then with the new version implemented.

image
image

…rix inverse calculation and other functions to be allocation free
Copy link

codecov bot commented Jan 21, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.35%. Comparing base (0d13791) to head (18e6787).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #464      +/-   ##
==========================================
+ Coverage   82.73%   83.35%   +0.62%     
==========================================
  Files          70       70              
  Lines        4656     4819     +163     
==========================================
+ Hits         3852     4017     +165     
+ Misses        804      802       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@J-C-Q
Copy link
Contributor Author

J-C-Q commented Jan 21, 2025

I think this is ready for review.

  • I haven't put a changelog entry yet, but can do so if you want me to do it.
  • Also, I'm not sure if you want this to be documented in the random_destabilizer doc string, or somewhere else.
  • The tests are very basic, just comparing the result to the already existing implementation. Do you want me to add more or is that enough? The existing tests seem to be passing.

Thank you for the positive reaction to this idea! :)

@J-C-Q J-C-Q marked this pull request as ready for review January 21, 2025 17:05
Copy link
Member

@Krastanov Krastanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic, thank you!

I have left a few comments in, mostly focusing on the custom functions you have implemented and checking whether they are necessary on latest julia.

If you can also just add a changelog and bump the version number, I should be able to merge and release this quite quickly.

@test non_reuse_version == reuse_version
end
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May we add another testset that uses something like @allocated or @allocations to keep track of potential regressions in performance.

@testset "Random sampling of operators memory reuse" begin
for n in [1, test_sizes..., 200, 500]
workingmemory = QuantumClifford.RandDestabMemory(n)
for _ in 1:100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that this is very thoroughly tested, let's drop this down to for _ in 1:2 to not make the tests take too long.

Comment on lines +218 to +224
# Allocation free mod.(x,n)
function _inplace_mod!(x::Matrix{Int8}, n::Integer)
@inbounds for i in eachindex(x)
x[i] = mod(x[i], n)
end
return x
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you confirm that x .= mod.(x, n) is worse than a new function. On newer julias I get this:

julia> a = rand(Int8, 20,20);

julia> @time _inplace_mod!(a, 5);
  0.000001 seconds

julia> a = rand(Int8, 200,200);

julia> @time _inplace_mod!(a, 5);
  0.000054 seconds

julia> function f(a,n)
       a.=mod.(a,n)
       end;

julia> @time f(a, 5);
  0.017539 seconds (13.95 k allocations: 738.812 KiB, 99.65% compilation time)

julia> @time f(a, 5);
  0.000056 seconds

Both seem equally fast and non-allocating. I prefer we just use broadcast instead of a new _inplace_mod! if there is indeed no difference in performance

Comment on lines +226 to +231
function _inplace_equals!(x::BitMatrix, y::Matrix{Int8}, n::Integer)
@inbounds for i in eachindex(y)
x[i] = y[i] == n
end
return x
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question about using broadcast here

julia> a = trues(10,10);

julia> b = rand(Int8, 10, 10);

julia> @time a .= b.==5;
  0.047673 seconds (213.96 k allocations: 10.826 MiB, 99.95% compilation time)

julia> @time a .= b.==5;
  0.000004 seconds (1 allocation: 32 bytes)

julia> function _inplace_equals!(x::BitMatrix, y::Matrix{Int8}, n::Integer)
           @inbounds for i in eachindex(y)
               x[i] = y[i] == n
           end
           return x
       end;

julia> @time _inplace_equals!(a,b,5);
  0.000001 seconds

julia> f(a,b,n) = (a.=b.==n);

julia> @time f(a,b,5);
  0.032928 seconds (26.55 k allocations: 1.273 MiB, 99.97% compilation time)

julia> @time f(a,b,5);
  0.000003 seconds

Comment on lines +245 to +253
function _mul!(C, A, B, n)
for i in 1:n
for j in 1:n
for k in 1:n
@inbounds C[i, j] += A[i, k] * B[k, j]
end
end
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about LinearAlgebra.mul! -- is it measurably worse?

Comment on lines +429 to +446
function _quantum_mallows!(
rng::AbstractRNG,
hadamard::BitVector,
perm::Vector{T},
arr::Vector{T}) where {T<:Integer} # inplace version

n = length(perm)
for idx in 1:n
m = n - idx + 1
# sample h_i from given prob distribution
l = sample_geometric_2(rng, 2 * m)
weight = 2 * m - l
hadamard[idx] = (weight < m)
k = weight < m ? weight : 2 * m - weight - 1
perm[idx] = _popat!(arr, k + 1)
end
return hadamard, perm
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid code repetition (really, more to avoid the need to fix a bug in two separate places in the future), could you delete most of the content of quantum_mallows and have it internally just call your new quantum_mallows! together with whatever objects needs to be allocated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants