Let us begin with a precise mathematical definition for purifications.
Definition
Suppose X is a system in a state represented by a density matrix ρ, and ∣ψ⟩ is a quantum state vector of a pair (X,Y) that leaves ρ when Y is traced out:
ρ=TrY(∣ψ⟩⟨ψ∣).
The state vector ∣ψ⟩ is then said to be a purification of ρ.
The pure state ∣ψ⟩⟨ψ∣, expressed as a density matrix rather than a quantum state vector, is also commonly referred to as a purification of ρ when the equation in the definition is true, but we'll generally use the term to refer to a quantum state vector.
The term purification is also used more generally when the ordering of the systems is reversed, when the names of the systems and states are different (of course), and when there are more than two systems.
For instance, if ∣ψ⟩ is a quantum state vector representing a pure state of a compound system (A,B,C), and the equation
ρ=TrB(∣ψ⟩⟨ψ∣)
is true for a density matrix ρ representing a state of the system (A,C), then ∣ψ⟩ is still referred to as a purification of ρ.
For the purposes of this lesson, however, we'll focus on the specific form described in the definition.
Properties and facts concerning purifications, according to this definition, can typically be generalized to more than two systems by re-ordering and partitioning the systems into two compound systems, one playing the role of X and the other playing the role of Y.
Suppose that X and Y are any two systems and ρ is a given state of X.
We will prove that there exists a quantum state vector ∣ψ⟩ of (X,Y) that purifiesρ — which is another way of saying that ∣ψ⟩ is a purification of ρ — provided that the system Y is large enough.
In particular, if Y has at least as many classical states as X, then a purification of this form necessarily exists for every state ρ.
Fewer classical states of Y are required for some states ρ;
in general, rank(ρ) classical states of Y are necessary and sufficient for the existence of a quantum state vector of (X,Y) that purifies ρ.
Consider first any expression of ρ as a convex combination of n pure states, for any positive integer n.
ρ=a=0∑n−1pa∣ϕa⟩⟨ϕa∣
In this expression, (p0,…,pn−1) is a probability vector and ∣ϕ0⟩,…,∣ϕn−1⟩ are quantum state vectors of X.
One way to obtain such an expression is through the spectral theorem, in which case n is the number of classical states of X,p0,…,pn−1 are the eigenvalues of ρ, and ∣ϕ0⟩,…,∣ϕn−1⟩ are orthonormal eigenvectors corresponding to these eigenvalues.
There's actually no need to include the terms corresponding to the zero eigenvalues of ρ in the sum, which allows us to alternatively choose n=rank(ρ) and p0,…,pn−1 to be the non-zero eigenvalues of ρ.
This is the minimum value of n for which an expression of ρ taking the form above exists.
To be clear, it is not necessary that the chosen expression of ρ, as a convex combination of pure states, comes from the spectral theorem — this is just one way to obtain such an expression.
In particular, n could be any positive integer, the unit vectors ∣ϕ0⟩,…,∣ϕn−1⟩ need not be orthogonal, and the probabilities p0,…,pn−1 need not be eigenvalues of ρ.
We can now identify a purification of ρ as follows.
∣ψ⟩=a=0∑n−1pa∣ϕa⟩⊗∣a⟩
Here we're making the assumption that the classical states of Y include 0,…,n−1.
If they do not, an arbitrary choice for n distinct classical states of Y can be substituted for 0,…,n−1.
Verifying that this is indeed a purification of ρ is a simple matter of computing the partial trace, which can be done in the following two equivalent ways.
where ∣ψθ⟩=cos(θ)∣0⟩+sin(θ)∣1⟩.
The quantum state vector
cos(π/8)∣ψπ/8⟩⊗∣0⟩+sin(π/8)∣ψ5π/8⟩⊗∣1⟩
which describes a pure state of the pair (X,Y), is therefore a purification of ρ.
Alternatively, we can write
ρ=21∣0⟩⟨0∣+21∣+⟩⟨+∣.
This is a convex combination of pure states but not a spectral decomposition because ∣0⟩ and ∣+⟩ are not orthogonal and 1/2 is not an eigenvalue of ρ.
Nevertheless, the quantum state vector
Next, we will discuss Schmidt decompositions, which are expressions of quantum state vectors of pairs of systems that take a certain form.
Schmidt decompositions are closely connected with purifications, and they're very useful in their own right.
Indeed, when reasoning about a given quantum state vector ∣ψ⟩ of a pair of systems, the first step is often to identify or consider a Schmidt decomposition of this state.
Definition
Let ∣ψ⟩ be a given quantum state vector of a pair of systems (X,Y). A Schmidt decomposition of ∣ψ⟩ is an expression of the form
∣ψ⟩=a=0∑r−1pa∣xa⟩⊗∣ya⟩,
where p0,…,pr−1 are positive real numbers summing to 1 and both of the sets {∣x0⟩,…,∣xr−1⟩} and {∣y0⟩,…,∣yr−1⟩} are orthonormal.
The values
p0,…,pr−1
in a Schmidt decomposition of ∣ψ⟩ are known as its Schmidt coefficients, which are uniquely determined (up to their ordering) — they're the only positive real numbers that can appear in such an expression of ∣ψ⟩.
The sets
{∣x0⟩,…,∣xr−1⟩}and{∣y0⟩,…,∣yr−1⟩},
on the other hand, are not uniquely determined, and the freedom one has in choosing these sets of vectors will be clarified in the explanation that follows.
We'll now verify that a given quantum state vector ∣ψ⟩ does indeed have a Schmidt decomposition, and in the process, we'll learn how to find one.
Consider first an arbitrary (not necessarily orthogonal) basis {∣x0⟩,…,∣xn−1⟩} of the vector space corresponding to the system X.
Because this is a basis, there will always exist a uniquely determined selection of vectors ∣z0⟩,…,∣zn−1⟩ for which the following equation is true.
∣ψ⟩=a=0∑n−1∣xa⟩⊗∣za⟩(1)
For example, suppose {∣x0⟩,…,∣xn−1⟩} is the standard basis associated with X.
Assuming the classical state set of X is {0,…,n−1}, this means that ∣xa⟩=∣a⟩ for each a∈{0,…,n−1}, and we find that
∣ψ⟩=a=0∑n−1∣a⟩⊗∣za⟩
when
∣za⟩=(⟨a∣⊗IY)∣ψ⟩
for each a∈{0,…,n−1}.
We frequently consider expressions like this when contemplating a standard basis measurement of X.
It's important to note that the formula
∣za⟩=(⟨a∣⊗IY)∣ψ⟩
for the vectors ∣z0⟩,…,∣zn−1⟩ in this example only works because {∣0⟩,…,∣n−1⟩} is an orthonormal basis.
In general, if {∣x0⟩,…,∣xn−1⟩} is a basis that is not necessarily orthonormal, then the vectors ∣z0⟩,…,∣zn−1⟩ are still uniquely determined by the equation (1), but a different formula is needed.
One way to find them is first to identify vectors ∣w0⟩,…,∣wn−1⟩ so that the equation
⟨wa∣xb⟩={10a=ba=b
is satisfied for all a,b∈{0,…,n−1}, at which point we have
∣za⟩=(⟨wa∣⊗IY)∣ψ⟩.
For a given basis {∣x0⟩,…,∣xn−1⟩} of the vector space corresponding to X, the uniquely determined vectors ∣z0⟩,…,∣zn−1⟩ for which the equation (1) is satisfied won't necessarily satisfy any special properties, even if {∣x0⟩,…,∣xn−1⟩} happens to be an orthonormal basis.
If, however, we choose {∣x0⟩,…,∣xn−1⟩} to be an orthonormal basis of eigenvectors of the reduced state
ρ=TrY(∣ψ⟩⟨ψ∣),
then something interesting happens.
Specifically, for the uniquely determined collection {∣z0⟩,…,∣zn−1⟩} for which the equation (1) is true, we find that this collection must be orthogonal.
In greater detail, consider a spectral decomposition of ρ.
ρ=a=0∑n−1pa∣xa⟩⟨xa∣
Here we're denoting the eigenvalues of ρ by p0,…,pn−1 in recognition of the fact that ρ is a density matrix — so the vector of eigenvalues (p0,…,pn−1) forms a probability vector — while {∣x0⟩,…,∣xn−1⟩} is an orthonormal basis of eigenvectors corresponding to these eigenvalues.
To see that the unique collection {∣z0⟩,…,∣zn−1⟩} for which the equation (1) is true is necessarily orthogonal, we can begin by computing the partial trace.
This expression must agree with the spectral decomposition of ρ.
Because {∣x0⟩,…,∣xn−1⟩} is a basis, we conclude that the set of matrices
{∣xa⟩⟨xb∣:a,b∈{0,…,n−1}}
is linearly independent, and so it follows that
⟨zb∣za⟩={pa0a=ba=b,
establishing that {∣z0⟩,…,∣zn−1⟩} is orthogonal.
We've nearly obtained a Schmidt decomposition of ∣ψ⟩.
It remains to discard those terms in (1) for which pa=0 and then write ∣za⟩=pa∣ya⟩ for a unit vector ∣ya⟩ for each of the remaining terms.
A convenient way to do this begins with the observation that we're free to number the eigenvalue/eigenvector pairs in a spectral decomposition of the reduced state ρ however we wish — so we may assume that the eigenvalues are sorted in decreasing order:
p0≥p1≥⋯≥pn−1.
Letting r=rank(ρ), we find that p0,…,pr−1>0 and pr=⋯=pn−1=0.
So, we have