Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Improve performance of QubitVector::sample_measure #819

Closed
wants to merge 4 commits into from

Conversation

merav-aharoni
Copy link
Contributor

Summary

The performance of QubitVector::sample_measure can be improved.

Details and comments

This is the same improvement made in PR#808 for MPS.

  1. Instead of iterating over the full vector of probabilities, we collect only the non-zero probabilities and store them in a vector of accumulated probabilities.
  2. Since this vector is strictly increasing, we can search for the index of rnd using binary search rather than linear search.
    I think we should actually implement this algorithm once in the base class 'State' so that we don't duplicate the code.

@merav-aharoni
Copy link
Contributor Author

image

@yaelbh
Copy link
Contributor

yaelbh commented Jul 12, 2020

The problem here is quite general: randomize an outcome of a multinomial distribution. Any known efficient algorithms that we can apply?

@yaelbh
Copy link
Contributor

yaelbh commented Jul 15, 2020

Maybe sort the array rnds. Then the binary search is not done over the entire domain every time.

@yaelbh
Copy link
Contributor

yaelbh commented Jul 15, 2020

Also no need for a binary search. The algorithm should look like this:

<sort rnds>
uint rnd_index = 0;
for (sample = 0; sample < END - 1; ++sample) {
  while (rnds[rnd_index] < probability(sample)) {
    <increase count of measurement result `sample`>
    ++rnd_index;
  }
}

Complexity is O(shots + 2^n)

@merav-aharoni
Copy link
Contributor Author

@yaelbh , that's a nice idea, but I am not sure which algorithm will work better. The performance depends on the number of shots and on the size of the probabilities vector, so to be really optimal, we may have to implement several algorithms and choose between them based on the above parameters. I am not sure it is worth the extra complication in the code. I tried to avoid that and kept only 2 algorithms.
One more point to note - the size of the probabilities vector will be much smaller than 2^n in the average case, because I store only the non-zero probabilities.

@yaelbh
Copy link
Contributor

yaelbh commented Jul 15, 2020

@merav-aharoni But in order to store the non-zero probabilities you already make a pass on all 2^n possible results. So I don't see the need to keep several algorithms.

@merav-aharoni
Copy link
Contributor Author

@yaelbh , so do you want to implement it? that way Hiroshi can include it in his benchmarking. Because I won't get to it this week.

@yaelbh
Copy link
Contributor

yaelbh commented Jul 15, 2020

@hhorii I'm going to open yet another pull request for sample measure. Will you be able to include it in your benchmarking?

@yaelbh
Copy link
Contributor

yaelbh commented Jul 19, 2020

See #836.

Copy link
Collaborator

@hhorii hhorii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, this PR is effective for smal qubits.

for (uint_t i=0; i<size; i++) {
if (!AER::Linalg::almost_equal(probability(i), 0.0)) {
index_vec.push_back(i);
acc_probvector.push_back(acc_probvector[j-1] + probability(i));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index_vec and acc_probvector will require the same size of this qubit vector in worst case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the worst case, you are correct. But in most cases, these vectors will be must smaller than the qubit vector.

uint_t size = 1LL << num_qubits_;
uint_t j = 1;
acc_probvector.push_back(0.0);
for (uint_t i=0; i<size; i++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterating all the qubit with a thread is too heavy for large qubit. Maybe this is not efficient for 25 or more qubits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experiments, I went up to 26 qubits, and the results are pretty consistent. These experiments are on random graphs. The original algorithm also passes over the entire statevector to get the probabilities, so it has the same problem. But at least in the new algorithm, this is done only once.
The algorithm may not be so efficient when the depth is large, because then the acc_probvector will grow. I can try to rerun the experiments with higher depths to see how efficient it is. But maybe the best benchmark will be to run on your benchmarks, that represent more realistic problems. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants