-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster random circuits #464
base: master
Are you sure you want to change the base?
Conversation
…rix inverse calculation and other functions to be allocation free
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #464 +/- ##
==========================================
+ Coverage 82.73% 83.35% +0.62%
==========================================
Files 70 70
Lines 4656 4819 +163
==========================================
+ Hits 3852 4017 +165
+ Misses 804 802 -2 ☔ View full report in Codecov by Sentry. |
I think this is ready for review.
Thank you for the positive reaction to this idea! :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fantastic, thank you!
I have left a few comments in, mostly focusing on the custom functions you have implemented and checking whether they are necessary on latest julia.
If you can also just add a changelog and bump the version number, I should be able to merge and release this quite quickly.
@test non_reuse_version == reuse_version | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May we add another testset that uses something like @allocated
or @allocations
to keep track of potential regressions in performance.
@testset "Random sampling of operators memory reuse" begin | ||
for n in [1, test_sizes..., 200, 500] | ||
workingmemory = QuantumClifford.RandDestabMemory(n) | ||
for _ in 1:100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that this is very thoroughly tested, let's drop this down to for _ in 1:2
to not make the tests take too long.
# Allocation free mod.(x,n) | ||
function _inplace_mod!(x::Matrix{Int8}, n::Integer) | ||
@inbounds for i in eachindex(x) | ||
x[i] = mod(x[i], n) | ||
end | ||
return x | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you confirm that x .= mod.(x, n)
is worse than a new function. On newer julias I get this:
julia> a = rand(Int8, 20,20);
julia> @time _inplace_mod!(a, 5);
0.000001 seconds
julia> a = rand(Int8, 200,200);
julia> @time _inplace_mod!(a, 5);
0.000054 seconds
julia> function f(a,n)
a.=mod.(a,n)
end;
julia> @time f(a, 5);
0.017539 seconds (13.95 k allocations: 738.812 KiB, 99.65% compilation time)
julia> @time f(a, 5);
0.000056 seconds
Both seem equally fast and non-allocating. I prefer we just use broadcast instead of a new _inplace_mod!
if there is indeed no difference in performance
function _inplace_equals!(x::BitMatrix, y::Matrix{Int8}, n::Integer) | ||
@inbounds for i in eachindex(y) | ||
x[i] = y[i] == n | ||
end | ||
return x | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question about using broadcast here
julia> a = trues(10,10);
julia> b = rand(Int8, 10, 10);
julia> @time a .= b.==5;
0.047673 seconds (213.96 k allocations: 10.826 MiB, 99.95% compilation time)
julia> @time a .= b.==5;
0.000004 seconds (1 allocation: 32 bytes)
julia> function _inplace_equals!(x::BitMatrix, y::Matrix{Int8}, n::Integer)
@inbounds for i in eachindex(y)
x[i] = y[i] == n
end
return x
end;
julia> @time _inplace_equals!(a,b,5);
0.000001 seconds
julia> f(a,b,n) = (a.=b.==n);
julia> @time f(a,b,5);
0.032928 seconds (26.55 k allocations: 1.273 MiB, 99.97% compilation time)
julia> @time f(a,b,5);
0.000003 seconds
function _mul!(C, A, B, n) | ||
for i in 1:n | ||
for j in 1:n | ||
for k in 1:n | ||
@inbounds C[i, j] += A[i, k] * B[k, j] | ||
end | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about LinearAlgebra.mul!
-- is it measurably worse?
function _quantum_mallows!( | ||
rng::AbstractRNG, | ||
hadamard::BitVector, | ||
perm::Vector{T}, | ||
arr::Vector{T}) where {T<:Integer} # inplace version | ||
|
||
n = length(perm) | ||
for idx in 1:n | ||
m = n - idx + 1 | ||
# sample h_i from given prob distribution | ||
l = sample_geometric_2(rng, 2 * m) | ||
weight = 2 * m - l | ||
hadamard[idx] = (weight < m) | ||
k = weight < m ? weight : 2 * m - weight - 1 | ||
perm[idx] = _popat!(arr, k + 1) | ||
end | ||
return hadamard, perm | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to avoid code repetition (really, more to avoid the need to fix a bug in two separate places in the future), could you delete most of the content of quantum_mallows
and have it internally just call your new quantum_mallows!
together with whatever objects needs to be allocated?
Motivation
Motivated by random circuits, generating a random Clifford operation can be a time-critical function. When trying to implement such circuits using QuantumClifford.jl I noticed many allocations from the
random_clifford
function. So I tried to implement a version of therandom_clifford
function that reuses memory, so that repeated generation of random gates doesn't allocate that much.Implementation
This PR implements a
RandDestabMemory
type which stores all the memory needed in the already existingrandom_destabilizer
algorithm. An instance of this type can be passed to therandom_destabilizer
function to reuse the same memory. The algorithmisshould be unchanged; however, this PR changes the code in some places to avoid allocations. Some of the functionality gets separated into helper functions, but some loops persist in the main function, which arguably hurts readability. To avoid as many allocations as possible, this PR also rewrites a few other functions likequantum_mallows
to reuse memory. Most notably, the calculation of the inverse ofdelta'
anddelta_p'
doesn't rely on external packages anymore and makes use of the respective structure of the matrices, which increases performance.This PR also implements this new version of
random_destabilizer
in therandom_brickwork_clifford_circuit
function.TODO
There is still some cleanup to be done. As well as documentation and testing. However, I thought I'd share this idea already to get feedback if you are even interested in integrating this. I'm very open to any criticism and ideas to improve my code, as well as happy to work on this more.
Benchmark
Attached you can find some rudimentary benchmarks of the
random_brickwork_clifford_circuit
function, first without and then with the new version implemented.