intro.Rmd

# (PART) Bridging the gap {-}

# Quantum computing and quantum algorithms {#chapter-intro}

In this chapter we will introduce the preliminaries needed for in this book. 
We will extensively use linear algebra (norm of
matrices, SVD, properties of particular matrices, and so on), so the
reader is highly encouraged to refer to the appendix regarding the notations adopted.

## Getting rid of physics in quantum computing

Following one of the [lectures of
Susskind](https://www.youtube.com/watch?v=8mi0PoPvLvs), we are going to
start from a "handwavy" introduction of quantum mechanics, that starts
from few considerations and lead straight to the Schrödinger equation.
With a few mathematical tricks, we are going to justify the 4 axioms of quantum mechanics stated in the next sections intuitively. The hope is that the
reader can be gently guided from a tiny physical intuition to a greater
understanding of the **axioms of quantum mechanics**. While “axioms of quantum mechanics” implies that physics is involved, thanks to some of their formulation (see for instance in [@NC02], which we adopt), we can ignore the physics and think of them as “axioms of quantum computing”. As [Scott
Aaronson](https://www.youtube.com/watch?v=SJkbe4Rkv9c) rightly said, if
you are not a physicist, you need to remove physics from quantum
mechanics to understanding it!

The objective is that the reader should *not* feel the need to dig into
quantum mechanical details of quantum computers, but can start solely from the 4 axioms of quantum mechanics and build (a lot) from there.

At the beginning of the 20th century, when physicists started to model
quantum phenomena, they observed that the time and space evolution of quantum systems had two
properties: it was continuous and reversible (as in classical mechanics).
They decided to formalize these properties as follows.
First, they decided to model the state of a quantum system at time $p$
as a function $\psi(p)$, and they decided to model the evolution of
$\psi(p)$ for time $t$ as an operator $U(t)$ acting on $\psi(p)$s.
Let $I$ be the identity operator. Formally, the two requirements can be written as:

-   $U(\epsilon)=I-i\epsilon H$ (continuity)
-   $U^\dagger(t)U(t)=I$ (reversibility)

The first requirement reads that if we were to apply an evolution for a
small amount of time $\epsilon$, then $U$ would behave almost as the
identity $I$, differing by a small amount of another operator
$H$. The second requirement reads that the operation $U$ can be undone by applying a conjugate transpose of $U$. By doing this we obtain the identity operator. This means that we return to the initial state of the system before it evolved. From these two requirements, we can observe that:

$$I = U^\dagger(\epsilon)U(\epsilon)= (I+i\epsilon H)(I-i\epsilon H) = I -i \epsilon H + i \epsilon H^\dagger + O(\epsilon^2).$$
The only way for this equation to hold is for $H=H^\dagger$, i.e. the operator $H$ should be equal to its transpose conjugate. In mathematics, we call them Hermitian operators! (More about these in the appendix!). Now we can ask
ourselves what happens when we apply $U(\epsilon)$ to a quantum state
$\psi(t)$? Well it's simple to see now:
$$U(\epsilon)\psi(t) = \psi(t+\epsilon) = \psi(t) -i \epsilon H\psi(t).$$
With a little algebra we can rewrite the previous equation as:

$$ \frac{\psi(t+\epsilon) -  \psi(t)}{\epsilon} = -i  H\psi(t).$$ Note
that the left-hand side part of this equation can be rewritten, under the limit that $\epsilon \mapsto 0$
as a derivative:
$$\frac{d}{dt}\psi(t) = -i H \psi(t).$$
But this is the well-known Schrödinger equation! Note that, as computer scientists, we
take the right to remove some physical constant ($\hbar$) out of the
equation. What should be the takeaway of these observations? Well, first
we know that the Schrödinger equation is a differential equation whose
solution is fully determined if we were to know the initial state of our
system $\psi(p)$. Formally the solution can be written as:

$$\psi(p+t)=e^{-iHt}\psi(p).$$ From this last equation we can observe
further (more on this in the appendix) that the exponential of an
Hermitian matrix $e^{-iHt}$ is *defined* through its Taylor expansion is
just a *unitary* matrix: $U(t)=e^{-itH}$. Unitary matrices are exactly
those matrices that describe isometries: applying a unitary matrix to a
vector won't change its length. From this, we see that the two quantum
states $\psi(p+t)$ and $\psi(p)$ could be taken just to be vectors of a
fixed length, which - for practicality - we take to be unit vectors.
Notation-wise, we denote unit vectors describing quantum states as
"kets", i.e. we rewrite this equation as:

$$\ket{\psi(p+t)}=U(t)\ket{\psi(p)}$$ Hopefully, this introduction
should be enough for getting a better intuition of what comes next, and
give you a "justification" for the axioms of quantum mechanics.

## Axioms of quantum mechanics {#section-axioms}

The standard formalism used in quantum information is the Dirac's
"bra-ket" notation, which we will introduce in this section. We also
recall here the postulates of quantum mechanics, and take this
opportunity to settle the rest of the notation and preliminaries used in
this book. For the postulates, we follow the standard formulation in
[@NC02].

::: {.proposition name="Postulate 1"}
Associated to any isolated physical system is a complex vector space with inner product (that is, a Hilbert space) known as the state space of the system. The system is completely described by its state vector, which is a unit vector in the system's state space.
:::

As quantum states are described by unit vectors, we write $\ket{\psi}$
for a unit vector $\psi \in \mathcal{H}^d$, where $\mathcal{H}^d$ is a $d$ dimensional Hilbert space.
So for a non-normalized
vector $x \in \R^d$, the normalized quantum state is represented as
$\ket{x} = \norm{x}^{-1}x = \frac{1}{\norm{x}}\sum_{i=0}^n x_i\ket{i}$, where $\norm{x}$ is the $\ell_2$ norm of the vector $x$..
We denote as $\{\ket{i}\}_{i\in[d]}$ the canonical (also called
computational) basis for the Hilbert space. The transpose-conjugate of $\ket{x}$ is defined as
$\bra{x}$. We can think of $\ket{x}$ as a column vector, while $\bra{x}$
as a row vector, whose entries have been conjugated. In Dirac's
notation, we denote the inner product between two vector as
$\braket{x|y}$. Their outer product is denoted as
$\ket{x}\bra{y} = \sum_{i,j \in [d]}x_i y_j \ket{i}\bra{j} \in \mathcal{H}^d\otimes \mathcal{H}^d$.
The smallest (non-trivial) quantum system is called a qubit, and is a 2 dimensional
unit vector in $\mathbb{C}^2$. A base for this vector space in quantum
notation is denoted as $\ket{0}$ and $\ket{1}$. In this case, the vector
$\ket{\varphi} = \alpha\ket{0}+\beta\ket{1}$ for
$\alpha,\beta\in \mathbb{C}$ represent a valid quantum state as long as
$|\alpha|^2 + |\beta|^2 = 1$.

::: {.proposition name="Postulate 2"}
The evolution of a closed quantum system is described by a unitary transformation. That is, the state $\ket{\psi}$  of the system at time $t_1$ is related to the state $\ket{\psi}$  of the system at time $t_2$ by a unitary operator $U$ which depends only on the times $t_1$ and $t_2$.
:::

A matrix $U\in \mathbb{C}^{d \times d}$ is said to be unitary if
$UU^\dagger = U^\dagger U = I$, that is, if the inverse of $U$ equal to
its conjugate transpose. From this fact it follows that unitary matrices
are norm-preserving, and thus can be used as suitable mathematical
description of a pure quantum evolution. It is a standard exercise to
see that the following are all equivalent definition of unitary matrices
[@DeWolf]:

-   $\braket{Av, Aw} = \braket{v,w}$ for all $v,w$.
-   $\norm{Av} = \norm{v}$ for all $v$
-   $\norm{Av} = 1$ if and only if $\norm{v}=1$.
-   $A$ is a normal matrix with eigenvalues lying on the unit circle
-   The columns and the rows of $A$ form an orthonormal basis of $\mathcal{C}^d$
-   $A$ can be written as $e^{iH}$ for some Hermitian operator $H$.


::: {.example name="Determinant=1 is a necessary but not sufficient condition for being unitary"}
It is simple to see that any $2\times 2$ diagonal matrix $A$ with entries $10$ and $1/10$ has determinant is $1$, but it is not a unitary matrix.
:::

It will be useful to recall that if we have a unitary that performs the
mapping $\ket{a_i} \mapsto \ket{b_i}$, we can have the "matrix" form of
the operator as $\sum_i \ket{b_i}\bra{a_i}$. Recall also that the Pauli
matrices are both unitary *and* Hermitian, and this fact will be useful
in many places throughout this text.

```{=html}
<!--
# TODO Introduce more information about quantum gates
# Below Postulate 3 there are some examples of things that would be pedagogically good to have in this place.
# labels: good first issue, help wanted
-->
```
<!-- $$H = \frac{X + X}{\sqrt{2}}$$ -->

<!-- Phase gate: $$S = \frac{1}{\sqrt{2}} \begin{pmatrix} -->

<!--    1 & 0 \\ -->

<!--    0 & i -->

<!--    \end{pmatrix} = \frac{1}{\sqrt{2}} (\ket{0}\bra{0} + i\ket{1}\bra{1})$$ -->

<!-- T gate, also called as "$\pi$-over-8" gate:  -->

<!-- $$T = \begin{pmatrix} -->

<!--    1 & 0 \\ -->

<!--    0 & e^{i \frac{\pi}{4}} -->

<!--    \end{pmatrix} = e^{i\frac{\pi}{8}}\begin{pmatrix} -->

<!--    e^{-i\frac{\pi}{8}} & 0 \\ -->

<!--    0 & e^{i \frac{\pi}{8}} -->

<!--    \end{pmatrix}$$ -->

<!-- $$H = \frac{1}{\sqrt{2}}[ \bra{+}\ket{0}+ \bra{-}\ket{1} ] $$ -->

<!-- $$H \ket{x} = \frac{1}{\sqrt{2}} [ \ket{0} + (-1)^x \ket{1}] $$ -->

<!-- TOCHECK -->

<!-- $$\frac{1}{\sqrt{2^n}} \sum_{x,y \in \{0,1\}^n} (-1)^{xy} \ket{y}\bra{x}  $$ -->

```{=html}
<!--
# TODO Write paragraph on controlled rotation on ancilla qubits
# In many machine learning algorithms (like HHL), we perform a controlled rotation,
# controlled on the qubits of an acilla register, that holds the binary expansion of a number
#  between 0 and 1. Each of the qubits control a rotation of $\pi \times 2^{-j}$where $j$ is the j-th qubit in the register.
# It would be good to have that written formally. (A nice introduction can be found in the thesis of Gribling Sander)
# labels: good first issue, help wanted, code
-->
```
<!-- From Near term quantum algorithms for linear systems of equations -->

(ref:huang2019near) [@huang2019near]

::: {.exercise #ntqls name="From (ref:huang2019near)"}
Let $k \in \{0,1\}^n$ be an arbitrary $n$-bitstring.
Let $A=(\sigma_{x}^{(1)})^{k_1} \otimes \dots \otimes (\sigma_{x}^{(n)})^{k_n}$ and $\ket{b}=\ket{0^n}$.
What is the solution to the equation $A\ket{x} =\ket{b}$
:::

::: {.proposition #postulate3 name="Postulate 3"}
Quantum measurements are described by a collection $\{M_m\}$ of measurement operators. These are operators acting on the state space of the system being measured. The index $m$ refers to the measurement outcomes that may occur in the experiment. If the state of the quantum system is
$\ket{\psi}$ immediately before the measurement, then the probability that the result $m$ occurs is given by
$$
p(m) = \bra{\psi}M^\dagger_m M_m \ket{\psi}
$$
and the state of the system after the measurement is
$$
\frac{M_m\ket{\psi}}{\sqrt{\braket{\psi|M_m^\dagger M_m|\psi}}}
$$
The measurement operators satisfy the \emph{completeness equation}
$$
\sum_m M^\dagger _m M_m = I
$$
:::

In practice, we will mostly perform projective measurements (also called
von Neumann measurements). A projective measurement is described by an
*observable*: an Hermitian operator $M$ on the state space of the system
being observed. The observable has a spectral decomposition:
$$ M = \sum_m mP_m  $$ Where $P_m$ is a projector into the eigenspace of
$M$ associated with the eigenvalue $m$. This means that the measurement
operator will satisfy the following properties:

-   $P_m$ is positive definite
-   $P_m$ is Hermitian
-   $\sum_m P_m = I$
-   $(P_m)(P_n) = \delta_{mn}(P_m)$ are orthogonal projections.

Recall that an orthogonal projector $P$ has the properties that
$P=P^{\dagger}$ and $P^2 = P$. Note that the second property derives
from the first: all positive definite operators on $\mathbb{C}$ are
Hermitian (this is not always the case for positive definite operators
on $\mathbb{R}$, as it is simple to find positive definite matrices that
are not symmetric). Projective measurements can be understood as a
special case of Postulate 3: in addition to satisfying the completeness
relation $\sum_m M_m^\dagger M_m = I$ they also are orthogonal
projectors. Given a state $\ket{\psi}$, the probability of measuring
outcome $m$ is given by:

\begin{equation}
p(m) = \bra{\psi}P_m\ket{\psi}.
(\#eq:simple-measurement-probability)
\end{equation}

If we were to measure outcome $m$, then the state of the quantum system after the measurement would be:
$$\frac{P_m\ket{\psi}}{\sqrt{p(m)}}.$$

They have some useful properties. Just to cite one, the average value of
a projective measurement in a state $\ket{\psi}$ is defined as:
\begin{align}
    E(M)&  = \sum_m p(m)\\
& = \sum_m m \bra{\psi}P_m\ket{\psi}\\
& \bra{\psi}(\sum_m mP_m)\ket{\psi}\\
& \bra{\psi}M\ket{\psi}
\end{align}
In practice, our projective operators will be projectors in
the computational basis, i.e. $P_m = \sum_{m \in [d]} \ket{m}\bra{m}$.
From these rules, it is simple to see that the probability that a
measurement on a state $\ket{x} = \frac{1}{\norm{x}}\sum_i x_i\ket{i}$
gives outcome $i$ is $|x_i|^2/\norm{x}^2$.

::: {.proposition name="Postulate 4"}
The state space of a composite physical system is the tensor product of the state spaces of the component physical systems. Moreover, if we have systems numbered from 1 through $n$, and each state is described as $\ket{\psi_i}$, the join state of the total system is $\bigotimes_{j=1}^n \ket{\psi_i}=\ket{\psi_1}\ket{\psi_2}\dots \ket{\psi_n}$.
:::

To describe together two different quantum system we use the tensor
product. The tensor product between two vectors
$\ket{y} \in \mathbb{R}^{d_1}$ and $\ket{y} \in \mathbb{R}^{d_2}$ is a
vector $\ket{z} \in \mathbb{R}^{d_1 \times d_2}$. We can use the tensor
operation to describe the joint evolution of separate quantum system.
<!-- TODO FIXME -->

Even if it is not explicitly used much in quantum algorithms, it is useful
to recall the definition of entangled pure state.

::: {.definition name="Entangled state"}
A quantum state that cannot be expressed as a tensor product of two quantum state is said to be entangled.
:::

The same thing can be done for operators. Let $U_1$ be the unitary
describing the evolution of a quantum state $\ket{x}$ and $U_2$ the
unitary describing the evolution of a quantum state $\ket{y}$. Then
$U_1 \otimes U_2$ describes the evolution of the quantum system
$\ket{x}\otimes \ket{y}$.

From the definition of tensor product, we see that the tensor product of $n$ qubits (i.e.$n$ different $2$-dimensional vectors) is a vector of size $2^n$. Thus, to build a quantum state that stores $d$ numbers in its amplitudes we need $\lceil \log d\rceil$ qubits. More precisely, observe that for a vector $\vec{v} \in \mathcal{H}^d$ if we want to build $\ket{\vec{v]}} = \|\vec{v}\| \vec{v}$ we need only  $\lceil \log d \rceil$  numbers. If $d$ is not a power of $2$, we consider the vector padded with $0$ to the nearest power of $2$ bigger than $d$. This will allow us to build states in the form of $\frac{1}{\|\vec{v}\|}\sum_{i=0}^{\lceil \log d \rceil} v_i \ket{i}$. This fact will used a lot in our quantum algorithms, especially in quantum machine learning. We will discuss in chapter \@ref(#chap-classical-data-quantum-computers) how to create these kind of quantum states.

### Review of important statements in quantum computation

Before delving into a review of quantum algorithms, we would like to
state here a few important lemmas.

(ref:NC02) [@NC02]

::: {.lemma #hadamard-on-bitstring name="Hadamard on a bitstring (ref:NC02)"}
Let $x \in \{0,1\}^n$ be a bitstring, and $H$ be an Hadamard gate. Then:

\begin{equation}
H^{\otimes n}\ket{x} = \frac{1}{\sqrt{2^n}}\sum_{z_1, \dots z_n \in \{0,1\}^n} (-1)^{x_1z_1 + x_2z_2 + \dots + x_nz_n} \ket{z_1, \dots, z_n} = \frac{1}{\sqrt{2^n}}\sum_{z\in \{0,1\}^n} (-1)^{x^Tz} \ket{z},
\end{equation}

where $x^Tz$ is the bitwise inner product of strings in $Z_2^n$ modulo $2$.
:::

## Measuring complexity of quantum algorithms {#measuring-complexity}

This section is an attempt to organize in a coherent way some
fundamental concepts in quantum computer science. Some refereces for this section
are [@dorn2008quantum], [@DeWolf], and [@kuperberg2011another]. In this section we assume the read is comfortable with the big O notation.

Intuitively, the time complexity of an algorithm is the number of operations the computer has to perform from the beginning to the end of the execution of the algorithm. This model is easy to understand when we think about a personal computer with one CPU (more precisely, a von Neumann architecture). What is the right way of measuring the complexity of (quantum) circuits? For example, suppose to have a quantum circuit of $n$ qubits, and for each qubit we apply one Hadamard gate: what is the time complexity for running this circuit? Is it $O(n)$ or $O(1)$?. We will discuss here different ways for measuring the complexity of a quantum algorithm: the number of gates, the depth of the circuit, and the number of queries to an oracle.
The first two are a direct generalization of the measure of complexity of boolean circuits, where we measure the number of gates (serial time) or the depth of the circuit (parallel time). The third measure of complexity is the query complexity, which is the number of times we use (or query) an oracle: a circuit or a data structure that we can use as a black box. In this section we will define more formally the query complexity of a quantum algorithm, and in the next chapter we will explain possible implementations of different kinds of oracles. The *time* complexity, which is ultimately the most relevant measure of complexity of an algorithm for most practical purposes, is more tricky to define properly. If we assume that our hardware is capable of executing all the gates at the same depth in parallel, than the measure of time complexity becomes the depth of the quantum algorithm. We denote with $T(U)$ as the time complexity needed to implement a quantum circuit implementing a unitary $U$, and this is measured in terms of **number of gates**, i.e. the size of the circuit (this is quite standard see for example the section the introduction in [@ambainis2022matching]). This is a concept that bears some similarity with the clock rate of classical CPUs. If our hardware is not capable of executing all the gates at the same depth in one clock, we might have to consider the time complexity as the number of gates (eventually divided by the number of gates that can be executed at the same time). As an important exception to this rule we have the query complexity, where we consider the cost of a single query of an oracle as $O(1)$, i.e. as any other gate. This is somehow justified because for certain oracles we have efficient (i.e. with depth polylogarithmic in the size of the oracle) implementations. For this, the (often non said) assumption is that the part of the quantum computer dedicated to run the main part of the circuit has to be evaluated with serial time (i.e. the number of gates), while the part of the circuit dedicated to executing the oracle has to be evaluated for its parallel time complexity. An example of this, which we will treat in more details in the next chapter is the QRAM and the QRAG gate. This assumption allows to justifiably conflate the query complexity of an algorithm as its time complexity.


Last but not least, everything is made even more complicated by error correction. In fact, in some architecture and some error correction codes, the time complexity of the algorithm is well approximated by the number of $T$ gates of the circuit.


In this book, we use the standard notation of $\widetilde{O}$ to hide the polylogarithmic
factors in the big-O notation of the algorithms:
$O(n\log(n))=\widetilde{O}(n)$. The meticulous reader may notice that in this way we might make the notation ambiguous, as it might not be clear what is hidden in the $\widetilde{O}$ notation. The correct way of using this notation is to hide only factors that appear with a dependency bigger than the logarithm (i.e. $O(n\log(n))=\widetilde{O}(n)$ but $O(n\log(n)\log(1/\epsilon))=\widetilde{O}(n\log(1/\epsilon))$). Another possibility, to make our runtimes more precise, is to use a subscript to specify the factors that we choose to hide:
\begin{equation}
\widetilde{O}_{\epsilon,\delta}\left(\frac{1}{\epsilon}\right) = O\left(\frac{1}{\epsilon}\log(1/\epsilon)\log(1/\delta)\right)
\end{equation}


::: {.definition #quantum-oracle-access name="Quantum query or oracle access to a function"}
Let $\mathcal{H}$ be a finite-dimensional Hilbert space with basis $\{0,1\}^{n}$. Given $f:\{0,1\}^n\to\{0,1\}^m$, we say that we have quantum query access to $f$ if we have access to a unitary operator $U_f$ on $\mathcal{H}\otimes\mathbb{C}^{2^m}$ such that $U\ket{x}\ket{b} = \ket{x}\ket{b \oplus f(x)}$ for any bit string $b\in\{0,1\}^m$. One application of $U_f$ costs $T_f$ operations.
:::

::: {.definition #quantum-computation name="Quantum computation in the query model"}
Let $O_x$ be a unitary operator that encodes the input of our computation, and acts in a non-trivial way on its associated Hilbert space.
A quantum computation with $T$ queries to an oracle $O_x : \ket{i,b,z} \mapsto \ket{i,b \oplus x_i, z}$ is a sequence of unitary transformations:

$$U_TO_xU_{T-1}O_x \dots U_1O_xU_0$$

  Where each $U_t$ is a unitary that does not depend on the input of the algorithm. The output of the computation is obtained by measuring the rightmost register (by convention).
:::

Note that the second register holds the XOR of the $i$-th component of
the input with the previous state of the register (i.e. the b). This is
to make the computation reversible. Importantly, the definition
\@ref(def:quantum-oracle-access) is just an example of function for
which we can have query access. We can assume query access to unitaries
creating various kind of quantum states as output. We will see many
examples of oracles as definition \@ref(def:qram),
\@ref(def:oracle-access-adjacencymatrix),
\@ref(def:oracle-access-adjacencylist), and \@ref(def:KP-trees).

This is the so-called query model, or oracle model of (quantum) computation. An important thing here is the last statement of Definition
\@ref(def:quantum-oracle-access), on the cost of applying $U_f$, which is
$O(1)$. There are multiple reasons for working in this model, which might appear unrealistic at this time.
First, it is often the case that queries to these oracles are actually efficient
(as we will see in many example), so the query complexity is actually
equivalent (up to multiplicative polylogarithmic factors) to the depth
of the quantum circuit that is going to be executed. Another reason is
that in the oracle model is relatively simple to prove lower bounds and
results about the complexity of an algorithm in terms of the number of
queries to an oracle that encodes the input of the problem. It is
customary, for complex results in quantum algorithms to separate the
study of the query complexity of the problem and the depth or gate complexity of the
quantum circuit which is executed on the real hardware. We formalize
more this difference in the following definition.

::: {.definition #query-complexity name="Query complexity of an algorithm"}
The quantum query complexity of a quantum algorithm $\mathcal{A}$ is the number of queries to a black-box made by $\mathcal{A}$ in order to compute $f$.
:::

If we just care about the **relativized** complexity, we might limit
ourselves to compare two algorithms that solve the same problem in terms
of the number of queries to a given oracle, we might observe that one is
faster than the other. This is a **relativized** speedup. The oppositive
is an **absolute** speedup, i.e. when we also take into account the
complexity of the operations that are **not** queries to an oracle. In
the case of quantum algorithms, these might simply be the gate depth of
the circuit.

::: {.definition #circuit-complexity name="Circuit complexity or time complexity"}
The quantum circuit complexity (or time complexity) of a quantum algorithm $\mathcal{A}$ is the depth of the quantum circuit implementing $\mathcal{A}$.
:::

Quantum computing is not the only place where we measure the complexity
in terms of query to an oracle. In fact, it's sufficient to do a few
["queries"](https://www.cs.rutgers.edu/~sa1497/courses/cs514-s20/lec3.pdf)
(pun intended) on your search engine to realize that in many
computational models we have adopted this measure of computational
complexity.

**Note that the query complexity of an algorithm is a lower bound on the
gate complexity of the quantum circuit.** It is often simpler to study
first the query complexity of a quantum algorithm and then study the
time complexity. For most quantum algorithms (but not all!) the time
complexity coincides with the query complexity, up to a logarithmic
factor. Note that, if we find a way to have an oracle whose depth (i.e.
circuit complexity) is only (poly)logarithmic in the input size, then
the query complexity and the gate complexity coincide up to a negligible
polylogarithmic factor. There are some exceptions. Most notably, there
is a quantum algorithm for the important *hidden subgroup problem* with
only polynomial query complexity, while the classical counterpart has a
query complexity that is exponential in the input size. Nevertheless,
the overall time complexity of the quantum algorithm is (to date) still
exponential, and polynomial-time quantum algorithms are known only for a
few specializations of the problem.

We will clarify better some definitions that are used to describe the
probabilistic behavior of an algorithm:

::: {.definition #error-behavior name="Randomized algorithms"}
Let $f : \{0,1\}^N \mapsto \{0,1\}$ be a Boolean function. An algorithm computes $f$:

-  **exactly** if the outputs equals $f(x)$ with probability 1 for all $x \in\{0,1\}^N$
- with **zero error** if it is allowed to give the answer "UNDEFINED" with probability smaller than $1/2$ (but if the output is $0$ or $1$ it must be correct)
- with **bounded error** if the output equals $f(x)$ with probability greater than $2/3$ for all $x\in \{0,1\}^N$.

:::

A bounded error (quantum or classical) algorithm that fails with
probability $1/3$ (or any other constant smaller than $1/2$) is meant to
fail *in the worst-case*. We do not expect the algorithm to fail in the
average case, i.e. for most of the inputs (see Appendix of [@DeWolf]).

If a (quantum or classical) algorithm is said to output the right answer
in **expected** (oftain said "in expectation") running time $T$, we can
quickly create another algorithm that has **worst-case** guarantees on
the runtime. This is obtained using the Markov's inequality, i.e.
theorem \@ref(thm:markov) as follows. Run the algorithm for $kT$ steps,
i.e.. stop the execution after $kT$ steps if it hasn't terminated
already. If $X$ is the random variable of the runtime of the computation
(so $\mathbb{E}[X]=T$), then:

$$Pr\left[X > kT \right] \leq \frac{1}{k} $$ So with probability
$\geq 1-\frac{1}{k}$ we will have the output of the algorithm.

<!-- TODO: ONE SIDE BOUNDED ERROR IS MISSING -->


## Review of famous quantum algorithms {#review-famous-quantum-algos}

In this Chapter we will explore some introductory quantum algorithms.
While some of them are not directly related to data analysis nor machine learning, it is important to report them here because they help us better understand the model of quantum computation we adopt.
Others will prove to be really useful for the quantum machine learning practitioner.

### Deutsch-Josza

::: {.definition name="Constant function"}
A function $f :\{0,1\}^n \mapsto \{0,1\}$ is constant if $f(x)=0 \forall x \in \{0,1\}^n$ or $f(x)=1 \forall x \in \{0,1\}^n$.
:::

::: {.definition name="Balanced function"}
A function $f :\{0,1\}^n \mapsto \{0,1\}$ is balanced if $f(x)=0$ for half of the inputs and $f(x)=1$ for the other half.
:::

(ref:deutsch1992rapid) [@deutsch1992rapid]

::: {.theorem name="Deutsch-Josza (ref:deutsch1992rapid)"}
Assume to have quantum access (as definition \@ref(def:quantum-oracle-access) ) to a unitary $U_f$ that computes the function $f :\{0,1\}^n \mapsto \{0,1\}$, which we are promised to be either constant or balanced. There is a quantum algorithm that decides which is the case with probabiliy $1$, using $U_f$ only once and using $O(\log(n))$ other gates.
:::

::: {.proof}
We start our quantum computer initializing $n$ qubit as $\ket{0}$ state follwed by a single ancilla qubit initialized in state $\ket{1}$, which we will use for the phase-kickback. Then, we apply the Hadamard transform on each of them. Mathematically, we are performing the following mapping:

\begin{equation}
\ket{0}^{\otimes n}\ket{1} \mapsto \left(\frac{1}{\sqrt{2^n}}\sum_{x\in\{0,1\}^n} \ket{x} \right)\ket{-}
\end{equation}
Now we apply $U_f$ using the first register (i.e. the first $n$ qubits) as input and the ancilla register (the last qubit) as output. Our quantum computer is now in the state
$$\left(\frac{1}{\sqrt{2^n}}\sum_{x\in\{0,1\}^n}(-1)^{f(x)}\ket{x} \right)\ket{-}$$
Now we apply $n$ Hadamard gates to the $n$ qubits in the first registers. Recalling lemma \@ref(lem:hadamard-on-bitstring), this gives the state

$$\left(\frac{1}{2^n} \sum_{x\in\{0,1\}^n}(-1)^{f(x)} \sum_{j \in \{0,1\}^n }(-1)^{xj} \ket{j} \right) \ket{-} = \left(\frac{1}{2^n} \sum_{x\in\{0,1\}^n}\sum_{j \in \{0,1\}^n} (-1)^{f(x)+ xj} \ket{j} \right)\ket{-}$$

In this state, note that the normalization factor has changed from $\frac{1}{\sqrt{2^n}}$ to $\frac{1}{2^n}$, and recall that $(-1)^{xj}$ is read as $(-1)^{ \sum_{p} x_pj_p \text{mod}2 }$.
The key idea of the proof of this algorithm lies in asking the right question to the previous state: what is the probability of measuring the state $\ket{0}^n$ in the first register? The answer to this question will conclude the proof of this theorem. Before looking at the probability, observe that the amplitude of the state $\ket{j=0}$ we will see that it is just $\frac{1}{2^n}\sum_{x}(-1)^{f(x)}$, as $x^Tj=0$ if $j=0_1\dots 0_n$, for all $x$. Then,

\begin{equation}
\frac{1}{2^n} \sum_{i \in \{0,1\}^n } (-1)^f(x) = \begin{cases}
      1 & \text{if } f(x)=0 \forall x   \\
      -1 & \text{if } f(x)=1 \forall x  \\
      0 & \text{if } f(x) \text{is balanced}
      \end{cases}
      \end{equation}


To conclude, reckon that if the function $f$ is constant (first two cases), we will measure $\ket{0}^{\otimes n}$ with probability $1$, and if the function is balanced, we will measure some bitstring of $n$ bits that is different than the string $0_1\dots 0_n$.

:::

It's simple to see that if we want to solve this problem with a
classical *deterministic* algorithm, we need exactly $2^n/2 + 1$
queries. However, with the usage of a randomized algorithm we can
drastically reduce the number of queries by admitting a small
probability of failure.

::: {.exercise}
Can you think of an efficient randomized classical algorithm for solving this problem? Perhaps you can use the tools in the Appendix for randomized algorithms.
:::

We now turn our attention to the first learning problem of this book.
This is rarely stressed that the following algorithm can be interpreted
as a learning algorithm.

### Bernstein-Vazirani

::: {.theorem name="Bernstein-Vazirani"}
Assume to have quantum access (as definition \@ref(def:quantum-oracle-access) )  to a unitary $U_f$ that computes the function $f :\{0,1\}^n \mapsto \{0,1\}$, which computes $f_a(x) = (x,a) = ( \sum_i^n x_i a_i )\mod 2$ for a secret string $a \in \{0,1\}^n$.  There is a quantum algorithm that learns $a$ with probability $1$, using $U_f$ only once and $O(\log(n))$ other gates.
:::

::: {.proof}
The algorithm follows exactly the same steps as the Deutsch-Josza algorithm. The proof is slightly different, and start by noting that, after the application of the oracle $U_f$, the register of our quantum computer is in the following state:


\begin{equation}
\left(\frac{1}{\sqrt{2^n}} \sum_{x\in\{0,1\}^n}(-1)^{f(x)}  \ket{x} \right) \ket{-} = \left(\frac{1}{\sqrt{2^n}} \sum_{x\in\{0,1\}^n} (-1)^{a^T x}\ket{x} \right)\ket{-}
\end{equation}

Now we resort again to Lemma \@ref(lem:hadamard-on-bitstring), and we use the fact that the Hadamard it is also a self-adjoint operator (i.e. it is the inverse of itself: $H^2 = I$). Thus applying $n$ Hadamard gates to the first register leads to the state $\ket{a}$ deterministically.

:::

```{=html}
<!-- TODO
add proof of the lower bounds for the runtime of classical algorithms.
-->
```
::: {.exercise}
Can you think of an efficient randomized classical algorithm for solving Berstain-Vazirani problem? You can use the tools in the Appendix for randomized algorithms.
:::

Other material for learning about Deutsch-Josza and Bernstein-Vazirani
algorithms are the lecture notes of Ronald de Wolf that you can find
[here](https://homepages.cwi.nl/~rdewolf/qcnotes.pdf).

```{r, echo=FALSE, fig.cap="Expect to see here Simon's algorithm - [Contribute here!](https://github.com/scinaw/quantumalgorithms.org)"}
knitr::include_graphics("images/wip.png")
```

### Hadamard test

Let $U$ be a unitary acting on $n$ qubits, and $\ket{\psi}$ a quantum
state on $n$ qubit (generated by another unitary $V$). We also require
to be able to apply the controlled version of the unitary $U$. Then, the
Hadamard test is a quantum circuit that we can use to estimate the value
of $\braket{\psi| U | \psi}$. The circuit is very simple, it consists in a
Hadamard gate applied on an ancilla qubit, the controlled application of
the unitary $U$ and another Hadamard gate.

The initial operation leads to
$(H\otimes V) \ket{0}\ket{0} = \ket{+}\ket{\psi}$, then we have:
$$
\begin{aligned}
\ket{\psi_{\text{final}}} & =(H\otimes I)(cU )\ket{+}\ket{\psi} = (H\otimes I)\frac{1}{\sqrt{2}}\left(\ket{0}\ket{\psi}+\ket{1}U\ket{\psi} \right) \\
 & =\frac{1}{2}\left(\ket{0}\left(\ket{\psi} + U\ket{\psi} \right) + \ket{1}\left(\ket{\psi} - U\ket{\psi} \right) \right)
\end{aligned}
$$

<!-- \begin{equation} -->
<!-- \ket{\psi_{\rm{final}}} & =(H\otimes I)(cU )\ket{+}\ket{\psi} = (H\otimes I)\frac{1}{\sqrt{2}}\left(\ket{0}\ket{\psi}+\ket{1}U\ket{\psi} \right) \\ -->
<!--  & =\frac{1}{2}\left(\ket{0}\left(\ket{\psi} + U\ket{\psi} \right) + \ket{1}\left(\ket{\psi} - U\ket{\psi} \right) \right) -->
<!-- \end{equation} -->


Note that the last state could be written equivalently, by just
factoring out the $\ket{\psi}$ state as
$\ket{\psi_{\text{final}}}=\frac{1}{2}\left(\ket{0}(I+U)\ket{\psi} + \ket{1}(I-U)\ket{\psi} \right)$.
The probability of measuring $0$ in the first qubit is:

\begin{align}
p(0)= & \norm{\frac{1}{2}(I+U)\ket{\psi} }_2^2 = \frac{1}{4} \left(\bra{\psi} + \bra{\psi}U^\dagger \right)\left(\ket{\psi} + U\ket{\psi} \right) \\
=& \frac{2+\braket{\psi(U+U^\dagger)\psi}}{4} = \frac{2+2\text{Re}(\braket{\psi|U|\psi})}{4}
\end{align}

<!-- EXPLAIN WHY REAL PART! -->

Where we used Postulate \@ref(prp:postulate3) with the observable
$\ket{0}\bra{0}\otimes I$. The probability of measuring $1$ in the first
register follows trivially.

::: {.exercise}
Can you tell what is the expected value of the observable $Z$ of the ancilla qubit? Remember that the possible outcome of the observable $Z$ are $\{+1, -1\}$.
:::

<!-- Solution: it's just $\braket{\psi U \psi}$ -->

However, we might be interested in the imaginary part of
$\braket{\psi|U|\psi}$. To estimate that, we need to slightly change the
circuit. After the first Hadamard gate, we apply on the ancilla qubit a
phase gate $S$, which gives to the state $\ket{1}$ a phase of $-i$. To get
the intuition behind this, let's recall that the imaginary part of a
complex number $z=(a+ib)$ is defined as:
$\text{Im}(z)= \frac{z-z^\ast}{2i}=\frac{i(z-z^\ast)}{-2}= \frac{-2b}{-2} =b$,
where after the definition, we just complicated the series of equations
by multiplying the numerator and denominator by $i$, a trick that we
will use later. The rest of the circuit of the Hadamard test stays the
same. The evolution of our state in the quantum computer is the
following:


\begin{align}
\ket{\psi_{\text{final}}}& =(H\otimes I)(cU )\left( \ket{0} - i\ket{1}\right)\ket{\psi} = (H\otimes I)\frac{1}{\sqrt{2}}\left(\ket{0}\ket{\psi}-i\ket{1}U\ket{\psi} \right) \\
 & = \frac{1}{2}\left(\ket{0}\left(\ket{\psi} -iU\ket{\psi} \right) + \ket{1}\left(\ket{\psi} + i U\ket{\psi} \right) \right)
\end{align}

The probability of measuring $0$ is given by the following equation.


\begin{align}
p(0)=\frac{1}{4}\left(\bra{\psi}+iU\bra{\psi} \right)\left(\ket{\psi}-iU\ket{\psi} \right) = \frac{1}{4} \left(2 - i\braket{\psi|U|\psi} + i \braket{\psi|U^\dagger|\psi} \right)
\end{align}


Note that when taking the conjugate of our state, we changed the sign of
$i$. We now have only to convince ourselves that
$-i\braket{\psi|U|\psi} + i \braket{\psi|U^\dagger|\psi} = i\braket{\psi|U^\dagger -U|\psi}$
is indeed the real number corresponding to
$2\text{Im}(\braket{\psi| U|\psi})$, and thus the whole equation can be a
probability.

::: {.exercise}
Can you check if the $S$ gate that we do after the first Hadamard can be performed before the last Hadamard gate instead?
:::

### Modified Hadamard test

In this section we complicate a little the results obtained in the
previous one, by finding the number of samples that we need to draw out
of a circuit in order to estimate the expected value or the probability
of interested with a certain level of accuracy and with a certain
probability.

::: {.theorem #modified-hadamard-test-no-amplification name="Modified Hadamard test (no amplitude amplification)"}
Assume to have access to a unitary $U_1$ that produces a state $U_1 \ket{0} = \ket{\psi_1}$ and a unitary $U_2$ that produces a state $\ket{\psi_2}$, where $\ket{\psi_1},\ket{\psi_2} \in \mathbb{C}^N$ for $N=2^n, n\in\mathbb{N}$. There is a quantum algorithm that allows to estimate the quantity $\braket{\psi_1|\psi_2}$ with additive precision $\epsilon$ using controlled applications of $U_1$ and $U_2$ $O(\frac{\log(1/\delta)}{\epsilon^2})$ times, with probability $1-\delta$
:::

::: {.proof}
Create a state $\ket{0}\ket{0}$ where the first register is just an ancilla qubit, and the second register has $n$ qubits. Then, apply an Hadamard gate to the first qubit, so to obtain $\ket{+}\ket{0}$. Then, controlled on the first register being $0$, we apply the unitary $U_1$, and controlled on the register being $1$, we apply the unitary $U_2$. Then, we apply again the Hadamard gate on the ancilla qubit. The state that we obtain is the following:

\begin{align}
(H\otimes I ) \frac{1}{\sqrt{2}}\left( \ket{0}\ket{\psi_1} + \ket{1}\ket{\psi_2} \right) \\
 = \frac{1}{2}\left(\ket{0}(\ket{\psi_1} + \ket{\psi_2}) +  \ket{1}(\ket{\psi_1} - \ket{\psi_2})  \right)
\end{align}


Again, now it is easy to state that the probability of measuring $0$ is:

\begin{equation}
p(0)=\frac{2+2\text{Re}[\braket{\psi_1|\psi_2}]}{4}
\end{equation}

We conclude the proof by recalling the Chernoff bound in theorem \@ref(thm:chernoff-bound2), as we did for the proof of the swap test.

:::

<!-- TODO add circuit for complex part of inner product.-->

Can you think of the reasons that might lead one to prefer the swap test
over the Hadamard test, or vice versa? At the end of the day, aren't they
both computing the same thing? For instance, note that for the Hadamard
test, we are requiring the ability to call the *controlled* version of
the unitaries $U_1$, and $U_2$, while for the swap test, we can just
treat them as black-boxes: these can be quantum states that we obtain from
a quantum process, or that we obtain from a quantum communication
channel.

### Swap test

The swap test was originally proposed in [@buhrman2001quantum], in the
context of quantum fingerprinting, but it has been quickly extended to
many other context. For us, the swap test is a way to obtain an estimate
of an inner product between two quantum states. The difference between
the swap test and the Hadamard test is that in this case we don't assume
to have access to the unitary creating the states, hence we cannot
perform controlled operations with this unitary. You can think that we
receive the states from a third-party, i.e. via a communication
protocol.

::: {.theorem #swap-test-no-amplification name="Swap test (no amplitude amplification)"}
Assume to have access to a unitary $U_1$ that produces a state $U_1 \ket{0} = \ket{\psi_1}$ and a unitary $U_2$ that produces a state $\ket{\psi_2}$, where $\ket{\psi_1},\ket{\psi_2} \in \mathbb{C}^N$ for $N=2^n, n\in\mathbb{N}$. There is a quantum algorithm that allows to estimate the quantity $|\braket{\psi_1|\psi_2}|^2$ with additive precision $\epsilon$ using $U_1$ and $U_2$ $O(\frac{\log(1/\delta)}{\epsilon^2})$ times with probability $1-\delta$
:::

::: {.proof}
Create a state $\ket{0}\ket{0}\ket{0}$ where the first register is just an ancilla qubit, and the second and third register have $n$ qubits each. Then, apply an Hadamard gate to the first qubit, so to obtain $\ket{+}\ket{0}\ket{0}$. Then, apply $U_1$ and $U_2$ to the second and third register, and then apply a controlled swap gate controlled on the ancilla qubit, targeting the two registers. More precisely, we apply $n$ controlled swap gates, each controlling a single qubit of the second and third register. Thus, we obtain the state:

  \begin{equation}
\frac{1}{\sqrt{2}}\left[\ket{0}(\ket{\psi_1}\ket{\psi_2}) + \ket{1}(\ket{\psi_2}\ket{\psi_1})  \right]
\end{equation}

  we now apply another Hadamard gate on the ancilla qubit, in order to obtain the following state:

\begin{align}
\ket{\phi}=& \frac{1}{\sqrt{2}}\left[\frac{1}{\sqrt{2}}\left(\ket{0}(\ket{\psi_1}\ket{\psi_2}) + \ket{1}(\ket{\psi_1}\ket{\psi_2})\right) + \frac{1}{\sqrt{2}}\left(\ket{0}(\ket{\psi_2}\ket{\psi_1}) - \ket{1}(\ket{\psi_2}\ket{\psi_1}) \right)  \right] \\
 =& \frac{1}{2}\left[\ket{0}\left(\ket{\psi_1}\ket{\psi_2}) + \ket{\psi_2}\ket{\psi_1} \right) +                          \ket{1}\left(\ket{\psi_1}\ket{\psi_2}) - \ket{\psi_2}\ket{\psi_1}\right)\right]
\end{align}

Now we consider the probability of measuring $0$ and $1$ in the ancilla qubit. More in detail, we want to estimate $p(0)=\bra{\phi}M_0\ket{\phi}$. For this, we recall our Postulate \@ref(prp:postulate3), and more precisely equation \@ref(eq:simple-measurement-probability), with $M_0=\ket{0}\bra{0}\otimes I$, where $I$ is the identiy operator over $n$ qubits. It is simple to see that $p(0)=\frac{2-2|\braket{\psi_1|\psi_2}|^2}{4}$.

By repeating this measurement $O(\log(1/\delta)/\epsilon^2)$ times, duly following the statement of the Chernoff bound in theorem \@ref(thm:chernoff-bound2), we have that the number of samples needed to obtain an error $\epsilon$ with probability $1-\delta$ is $t=\frac{\log(1/\delta)}{2\epsilon^2}$. Once we have obtained an estimate of $p(0)$, we can estimate the sought-after quantity of interest as $|\braket{\psi_1|\psi_2}|^2 = 1-2p(0)$.
:::

::: {.exercise name="Obtain the absolute value of the inner product"}
In the previous theorem we obtain an estimate of $|\braket{\psi_1|\psi_2}|^2$ with a certain error $\epsilon$. If we just take the square root of that number, what is the error in the estimation of $|\braket{\psi_1|\psi_2}|$? You are encouraged to read the section in the appendix \@ref(error-prop) on error propagation.
:::


<!-- <!-- ## Exercises --> -->

<!-- ```{=html} -->
<!-- <!-- -->
<!-- # TODO Add exercise on rewriting the swap test -->
<!-- # It's done in the draft/distances.Rmd but the images needs to be worked out properly. -->
<!-- # labels: good first issue, help wanted -->
<!-- --> -->
<!-- ``` -->