Data encoding
דף זה טרם תורגם. התוכן מוצג באנגלית.
Introduction and notation
To use a quantum algorithm, classical data must somehow be brought into a quantum circuit. This is usually referred to as data encoding, but is also called data loading. Recall from previous lessons the notion of a feature mapping, a mapping of data features from one space to another. Just transferring classical data to a quantum computer is a sort of mapping, and could be called a feature mapping. In practice, the built-in feature mappings in Qiskit (like `z_Feature Map and ZZ Feature Map) will typically include rotation layers and entangling layers that extend the state to many dimensions in the Hilbert space. This encoding process is a critical part of quantum machine learning algorithms and directly affects their computational capabilities.
Some of the encoding techniques below can be efficiently classically simulated; this is particularly easy to see in encoding methods that yield product states (that is, they do not entangle qubits). And remember that quantum utility is most likely to lie where the quantum-like complexity of the dataset is well-matched by the encoding method. So it is very likely that you will end up writing your own encoding circuits. Here, we show a wide variety of possible encoding strategies simply so that you can compare and contrast them, and see what is possible. There are some very general statements that can be made about the usefulness of encoding techniques. For example, efficient_su2 (see below) with a full entangling scheme is much more likely to capture quantum features of data than methods that yield product states (like z_feature_map). But this does not mean efficient_su2 is sufficient, or sufficiently well-matched to your dataset, to yield a quantum speed-up. That requires careful consideration of the structure of the data being modeled or classified. There is also a balancing act with circuit depth, since many feature maps which fully entangle the qubits in a circuit yield very deep circuits, too deep to get usable results on today's quantum computers.
Notation
A dataset is a set of data vectors: , where each vector is dimensional, that is, . This could be extended to complex data features. In this lesson, we may occasionally use these notations for the full set and its specific elements like . But we will mostly refer to the loading of a single vector from our dataset at a time, and will often simply refer to a single vector of features as .
Additionally, it is common to use the symbol to refer to the feature mapping of data vector . In quantum computing specifically, it is common to refer to mappings in quantum computing using a notation that reinforces the unitary nature of these operations. One could correctly use the same symbol for both; both are feature mappings. Throughout this course, we tend to use:
- when discussing feature mappings in machine learning, generally, and
- when discussing circuit implementations of feature mappings.
Normalization and information loss
In classical machine learning, training data features are often "normalized" or rescaled which often improves model performance. One common way of doing this is by using min-max normalization or standardization. In min-max normalization, feature columns of the data matrix (say, feature ) are normalized:
where min and max refer to the minimum and maximum of feature over the data vectors in the dataset . All the feature values then fall in the unit interval: for all , .
Normalization is also a fundamental concept in quantum mechanics and quantum computing, but it is slightly different from min-max normalization. Normalization in quantum mechanics requires that the length (in the context of quantum computing, the 2-norm) of a state vector is equal to unity: , ensuring that measurement probabilities sum to 1. The state is normalized by dividing by the 2-norm; that is, by rescaling
In quantum computing and quantum mechanics, this is not a normalization imposed by people on the data, but a fundamental property of quantum states. Depending on your encoding scheme, this constraint may affect how your data are rescaled. For example, in amplitude encoding (see below), the data vector is normalized as is required by quantum mechanics, and this affects the scaling of the data being encoded. In phase encoding, feature values are recommended to be rescaled as so that there is no information loss due to the modulo- effect of encoding to a qubit phase angle[1,2].
Methods of encoding
In the next few sections, we will refer to a small example classical dataset consisting of data vectors, each with features:
In the notation introduced above, we might say the feature of the data vector in our set is for example.
Basis encoding
Basis encoding encodes a classical -bit string into a computational basis state of a -qubit system. Take for example This can be represented as a -bit string as , and by a -qubit system as the quantum state . More generally, for a -bit string: , the corresponding -qubit state is with for . Note that this is just for a single feature.
Basis encoding in quantum computing represents each classical bit as a separate qubit, mapping the binary representation of data directly onto quantum states in the computational basis. When multiple features need to be encoded, each feature is first converted to its binary form and then assigned to a distinct group of qubits — one group per feature — where each qubit reflects a bit in the binary representation of that feature.
As an example, let us encode the vector (5, 7, 0).
Suppose all features are stored in four bits (more than we need, but enough to represent any integer that is single-digit in base 10):
5 → binary 0101
7 → binary 0111
0 → binary 0000
These bit strings are assigned to three sets of four qubits, so the overall 12-qubit basis state is:
Here, the first four qubits represent the first feature, the next four qubits the second feature, and the last four qubits the third feature. The code below converts the data vector (5,7,0) to a quantum state, and is generalized to do so for other single-digit features.
from qiskit import QuantumCircuit
# Data point to encode
x = 5 # binary: 0101
y = 7 # binary: 0111
z = 0 # binary: 0000
# Convert each to 4-bit binary list
x_bits = [int(b) for b in format(x, "04b")] # [0,1,0,1]
y_bits = [int(b) for b in format(y, "04b")] # [0,1,1,1]
z_bits = [int(b) for b in format(z, "04b")] # [0,0,0,0]
# Combine all bits
all_bits = x_bits + y_bits + z_bits # [0,1,0,1,0,1,1,1,0,0,0,0]
# Initialize a 12-qubit quantum circuit
qc = QuantumCircuit(12)
# Apply x-gates where the bit is 1
for idx, bit in enumerate(all_bits):
if bit == 1:
qc.x(idx)
qc.draw("mpl")

Check your understanding
Read the question below, think about your answer, then click the triangle to reveal the solution.
Write code to encode the first vector in our example data set :
using basis encoding.
Answer:
import math
from qiskit import QuantumCircuit
# Data point to encode
x = 4 # binary: 0100
y = 8 # binary: 1000
z = 5 # binary: 0101
# Convert each to 4-bit binary list
x_bits = [int(b) for b in format(x, '04b')] # [0,1,0,0]
y_bits = [int(b) for b in format(y, '04b')] # [1,0,0,0]
z_bits = [int(b) for b in format(z, '04b')] # [0,1,0,1]
# Combine all bits
all_bits = x_bits + y_bits + z_bits # [0,1,0,0,1,0,0,0,0,1,0,1]
# Initialize a 12-qubit quantum circuit
qc = QuantumCircuit(12)
# Apply x-gates where the bit is 1
for idx, bit in enumerate(all_bits):
if bit == 1:
qc.x(idx)
qc.draw('mpl')
Amplitude encoding
Amplitude encoding encodes data into the amplitudes of a quantum state. It represents a normalized classical -dimensional data vector, , as the amplitudes of a -qubit quantum state, :
where is the same dimension of the data vectors as before, is the element of and is the computational basis state. Here, is a normalization constant to be determined from the data being encoded. This is the normalization condition imposed by quantum mechanics:
In general, this is a different condition than the min/max normalization used for each feature across all data vectors. Precisely how this is navigated will depend on your problem. But there is no way around the quantum mechanical normalization condition above.
In amplitude encoding, each feature in a data vector is stored as an amplitude of a different quantum state. As a system of qubits provides amplitudes, amplitude encoding of features requires qubits.
As an example, let's encode the first vector in our example dataset , using amplitude encoding. Normalizing the resulting vector, we get:
and the resulting 2-qubit quantum state would be:
In the example above, the number of features in the vector , is not a power of 2. When is not a power of 2, we simply choose a value for the number of qubits such that and pad the amplitude vector with uninformative constants (here, a zero).
Like in basis encoding, once we calculate what state will encode our dataset, in Qiskit we can use the initialize function to prepare it:
import math
desired_state = [
1 / math.sqrt(105) * 4,
1 / math.sqrt(105) * 8,
1 / math.sqrt(105) * 5,
1 / math.sqrt(105) * 0,
]
qc = QuantumCircuit(2)
qc.initialize(desired_state, [0, 1])
qc.decompose(reps=5).draw(output="mpl")
An advantage of amplitude encoding is the aforementioned requirement of only qubits to encode. However, subsequent algorithms must operate on the amplitudes of a quantum state, and methods to prepare and measure the quantum states tend not to be efficient.
Check your understanding
Read the questions below, think about your answers, then click the triangles to reveal the solutions.
Write down the normalized state for encoding the following vector (made of two vectors from our example dataset):
using amplitude encoding.
Answer:
To encode 6 numbers, we will need to have at least 6 available states on whose amplitudes we can encode. This will require 3 qubits. Using an unknown normalization factor , we can write this as:
Note that
So finally,
For the same data vector write code to create a circuit that loads these data features using amplitude encoding.
Answer:
desired_state = [
9 / math.sqrt(270),
8 / math.sqrt(270),
6 / math.sqrt(270),
2 / math.sqrt(270),
9 / math.sqrt(270),
2 / math.sqrt(270),
0,
0,
]
print(desired_state)
qc = QuantumCircuit(3)
qc.initialize(desired_state, [0, 1, 2])
qc.decompose(reps=8).draw(output="mpl")
[0.5477225575051662, 0.48686449556014766, 0.36514837167011077, 0.12171612389003691, 0.5477225575051662, 0.12171612389003691, 0, 0]
You may need to deal with very large data vectors. Consider the vector
Write code to automate the normalization, and generate a quantum circuit for amplitude encoding.
Answer:
There are many possible answers. Here is code that prints a few steps along the way:
import numpy as np
from math import sqrt
init_list = [4, 8, 5, 9, 8, 6, 2, 9, 2, 5, 7, 0, 3, 7, 5]
qubits = round(np.log(len(init_list)) / np.log(2) + 0.4999999999)
need_length = 2**qubits
pad = need_length - len(init_list)
for i in range(0, pad):
init_list.append(0)
init_array = np.array(init_list) # Unnormalized data vector
length = sqrt(
sum(init_array[i] ** 2 for i in range(0, len(init_array)))
) # Vector length
norm_array = init_array / length # Normalized array
print("Normalized array:")
print(norm_array)
print()
qubit_numbers = []
for i in range(0, qubits):
qubit_numbers.append(i)
print(qubit_numbers)
qc = QuantumCircuit(qubits)
qc.initialize(norm_array, qubit_numbers)
qc.decompose(reps=7).draw(output="mpl")
Normalized array: [0.17342199 0.34684399 0.21677749 0.39019949 0.34684399 0.26013299 0.086711 0.39019949 0.086711 0.21677749 0.30348849 0. 0.1300665 0.30348849 0.21677749 0. ]
[0, 1, 2, 3]

Do you see advantages to amplitude encoding over basis encoding? If so, explain.
Answer:
There may be several answers. One answer is that, given the fixed ordering of the basis states, this amplitude encoding preserves the order of the numbers encoded. It will often also be encoded more densely.
A benefit of amplitude encoding is that only qubits are required for an -dimensional (-feature) data vector . However, amplitude encoding is generally an inefficient procedure that requires arbitrary state preparation, which is exponential in the number of CNOT gates. Stated differently, the state preparation has a polynomial runtime complexity of in the number of dimensions, where , and is the number of qubits. Amplitude encoding “provides an exponential saving in space at the cost of an exponential increase in time”[3]; however, runtime increases to are achievable in certain cases[4]. For an end-to-end quantum speedup, the data loading runtime complexity needs to be considered.
Angle encoding
Angle encoding is of interest in many QML models using Pauli feature maps such as quantum support vector machines (QSVMs) and variational quantum circuits (VQCs), among others. Angle encoding is closely related to phase encoding and dense angle encoding which are presented below. Here we will use "angle encoding" to refer to a rotation in , that is, a rotation away from the axis accomplished for example by an gate or an gate[1,3]. Really, one can encode data in any rotation or combination of rotations. But is common in the literature, so we emphasize it here.
When applied to a single qubit, angle encoding imparts a Y-axis rotation proportional to the data value. Consider the encoding of a single ()feature from the data vector in a dataset, :
Alternatively, angle encoding can be performed using gates, although the encoded state would have a complex relative phase compared to .
Angle encoding is different from the previous two methods discussed in several ways. In angle encoding:
- Each feature value is mapped to a corresponding qubit, , leaving the qubits in a product state.
- One numerical value is encoded at a time, rather than a whole set of features from a data point.
- qubits are required for data features, where . Often equality holds, here. We'll see how is possible in the next few sections.
- The resulting circuit is a constant depth (typically the depth is 1 prior to transpilation).
The constant depth quantum circuit makes it particularly amenable to current quantum hardware. One additional feature of encoding our data using (and specifically, our choice to use Y-axis angle encoding) is that it creates real-valued quantum states that can be useful for certain applications. For Y-axis rotation, data is mapped with a Y-axis rotation gate by a real-valued angle (Qiskit RYGate). As with phase encoding (see below), we recommend that you rescale data so that , preventing information loss and other unwanted effects.
The following Qiskit code rotates a single qubit from an initial state to encode a data value .
from qiskit.quantum_info import Statevector
from math import pi
qc = QuantumCircuit(1)
state1 = Statevector.from_instruction(qc)
qc.ry(pi / 2, 0) # Phase gate rotates by an angle pi/2
state2 = Statevector.from_instruction(qc)
states = state1, state2
We will define a function to visualize the action on the state vector. The details of the function definition are not important, but the ability to visualize the state vectors and their changes is important.
import numpy as np
from qiskit.visualization.bloch import Bloch
from qiskit.visualization.state_visualization import _bloch_multivector_data
def plot_Nstates(states, axis, plot_trace_points=True):
"""This function plots N states to 1 Bloch sphere"""
bloch_vecs = [_bloch_multivector_data(s)[0] for s in states]
if axis is None:
bloch_plot = Bloch()
else:
bloch_plot = Bloch(axes=axis)
bloch_plot.add_vectors(bloch_vecs)
if len(states) > 1:
def rgba_map(x, num):
g = (0.95 - 0.05) / (num - 1)
i = 0.95 - g * num
y = g * x + i
return (0.0, y, 0.0, 0.7)
num = len(states)
bloch_plot.vector_color = [rgba_map(x, num) for x in range(1, num + 1)]
bloch_plot.vector_width = 3
bloch_plot.vector_style = "simple"
if plot_trace_points:
def trace_points(bloch_vec1, bloch_vec2):
# bloch_vec = (x,y,z)
n_points = 15
thetas = np.arccos([bloch_vec1[2], bloch_vec2[2]])
phis = np.arctan2(
[bloch_vec1[1], bloch_vec2[1]], [bloch_vec1[0], bloch_vec2[0]]
)
if phis[1] < 0:
phis[1] = phis[1] + 2 * pi
angles0 = np.linspace(phis[0], phis[1], n_points)
angles1 = np.linspace(thetas[0], thetas[1], n_points)
xp = np.cos(angles0) * np.sin(angles1)
yp = np.sin(angles0) * np.sin(angles1)
zp = np.cos(angles1)
pnts = [xp, yp, zp]
bloch_plot.add_points(pnts)
bloch_plot.point_color = "k"
bloch_plot.point_size = [4] * len(bloch_plot.points)
bloch_plot.point_marker = ["o"]
for i in range(len(bloch_vecs) - 1):
trace_points(bloch_vecs[i], bloch_vecs[i + 1])
bloch_plot.sphere_alpha = 0.05
bloch_plot.frame_alpha = 0.15
bloch_plot.figsize = [4, 4]
bloch_plot.render()
plot_Nstates(states, axis=None, plot_trace_points=True)
That was just a single feature of a single data vector. When encoding features into the rotation angles of qubits, say for the data vector the encoded product state will look like this:
We note that this is equivalent to
Check your understanding
Read the questions below, think about your answers, then click the triangles to reveal the solutions.
Encode the data vector using angle encoding, as described above.
Answer:
qc = QuantumCircuit(3)
qc.ry(0, 0)
qc.ry(2 * math.pi / 4, 1)
qc.ry(2 * math.pi / 2, 2)
qc.draw(output="mpl")
Using angle encoding as described above, how many qubits are required to encode 5 features?
Answer: 5
Phase encoding
Phase encoding is very similar to the angle encoding described above. The phase angle of a qubit is a real-valued angle about the -axis from the +-axis. Data are mapped with a phase rotation, , where (see Qiskit PhaseGate for more information). It is recommended to rescale data so that . This prevents information loss and other potentially unwanted effects[1,2].
A qubit is often initialized in the state , which is an eigenstate of the phase rotation operator, meaning that the qubit state first needs to be rotated for phase encoding to be implemented. It therefore makes sense to initialize the state with a Hadamard gate: . Phase encoding on a single qubit means imparting a relative phase proportional to the data value:
The phase encoding procedure maps each feature value to the phase of a corresponding qubit, . In total, phase encoding has a circuit depth of 2, including the Hadamard layer, which makes it an efficient encoding scheme. The phase-encoded multi-qubit state ( qubits for features) is a product state:
The following Qiskit code first prepares the initial state of a single qubit by rotating it with a Hadamard gate, then rotates it again using a phase gate to encode a data feature .
qc = QuantumCircuit(1)
qc.h(0) # Hadamard gate rotates state down to Bloch equator
state1 = Statevector.from_instruction(qc)
qc.p(pi / 2, 0) # Phase gate rotates by an angle pi/2
state2 = Statevector.from_instruction(qc)
states = state1, state2
qc.draw("mpl", scale=1)
We can visualize the rotation in using the plot_Nstates function we defined.
plot_Nstates(states, axis=None, plot_trace_points=True)
The Bloch sphere plot shows the Z-axis rotation where . The light green arrow shows the final state.
Phase encoding is used in many quantum feature maps, particularly and feature maps, and general Pauli feature maps, among others.
Check your understanding
Read the questions below, think about your answers, then click the triangles to reveal the solutions.
How many qubits are required in order to use phase encoding as described above to store 8 features?
Answer: 8
Write code to the vector using phase encoding.
Answer:
There may be many answers. Here is one example:
phase_data = [4, 8, 5, 9, 8, 6, 2, 9, 2, 5, 7, 0]
qc = QuantumCircuit(len(phase_data))
for i in range(0, len(phase_data)):
qc.h(i)
qc.rz(phase_data[i] * 2 * math.pi / float(max(phase_data)), i)
qc.draw(output="mpl")
Dense angle encoding
Dense angle encoding (DAE) is a combination of angle encoding and phase encoding. DAE allows two feature values to be encoded in a single qubit: one angle with a Y-axis rotation angle, and the other with a -axis rotation angle: . It encodes two features as follows:
Encoding two data features to one qubit results in a reduction in the number of qubits required for the encoding. Extending this to more features, the data vector can be encoded as:
DAE can be generalized to arbitrary functions of the two features instead of the sinusoidal functions used here. This is called general qubit encoding[7].
As an example of DAE, the code below encodes and visualizes the encoding of the features and .
qc = QuantumCircuit(1)
state1 = Statevector.from_instruction(qc)
qc.ry(3 * pi / 8, 0)
state2 = Statevector.from_instruction(qc)
qc.rz(7 * pi / 4, 0)
state3 = Statevector.from_instruction(qc)
states = state1, state2, state3
plot_Nstates(states, axis=None, plot_trace_points=True)
Check your understanding
Read the questions below, think about your answers, then click the triangles to reveal the solutions.
Given the treatment above, how many qubits are needed to encode 6 features using dense encoding?
Answer: 3
Write code to load the vector using dense angle encoding.
Answer:
Note that we have padded the list with a "0" to avoid the problem of there being a single unused parameter in our encoding scheme.
dense_data = [4, 8, 5, 9, 8, 6, 2, 9, 2, 5, 7, 0, 3, 7, 5, 0]
qc = QuantumCircuit(int(len(dense_data) / 2))
entry = 0
for i in range(0, int(len(dense_data) / 2)):
qc.ry(dense_data[entry] * 2 * math.pi / float(max(dense_data)), i)
entry = entry + 1
qc.rz(dense_data[entry] * 2 * math.pi / float(max(dense_data)), i)
entry = entry + 1
qc.draw(output="mpl")
Encoding with built-in feature maps
Encoding at arbitrary points
Angle encoding, phase encoding, and dense encoding prepared product states with a feature encoded on each qubit (or two features per qubit). This is different from basis encoding and amplitude encoding, in that those methods make use of entangled states. There is not a 1:1 correspondence between data feature and qubit. In amplitude encoding, for example, you might have one feature as the amplitude of the state and another feature as the amplitude for . Generally, methods that encode in product states yield shallower circuits and can store 1 or 2 features on each qubit. Methods that use entanglement and associate a feature with a state rather than a qubit result in deeper circuits, and can store more features per qubit on average.
But encoding need not be entirely in product states or entirely in entangled states as in amplitude encoding. Indeed, many encoding schemes built into Qiskit allow encoding both before and after an entanglement layer, as opposed to just at the beginning. This is known as "data reuploading". For related work, see references [5] and [6].
In this section, we will use and visualize a few of the built-in encoding schemes. All the methods in this section encode features as rotations on parameterized gates on qubits, where . Note that maximizing data loading for a given number of qubits is not the only consideration. In many cases, circuit depth may be an even more important consideration than qubit count.
Efficient SU2
A common and useful example of encoding with entanglement is Qiskit's efficient_su2 circuit. Impressively, this circuit can, for example, encode 8 features on only 2 qubits. Let's see this, and then try to understand how it is possible.
from qiskit.circuit.library import efficient_su2
circuit = efficient_su2(num_qubits=2, reps=1, insert_barriers=True)
circuit.decompose().draw(output="mpl")
As we write our state, we will use the Qiskit convention that least-significant qubits are ordered to the far right, as in or These states can become very complicated very quickly, and this rare example may help explain why such states are seldom written out explicitly.
Our system starts in the state Up to the first barrier (a point we label ), our states are:
That's just dense encoding, which we've seen before. Now after the CNOT gate, at the second barrier (), our state is
We now apply the last set of single-qubit rotations and collect like states to obtain:
This is likely too complicated to parse. Instead, just step back and think about how many parameters we loaded onto the state: eight. But we have with just four computational basis states. At first glance, it may appear that we have loaded more parameters than makes sense, since the final state can be written as . Note, however, that each prefactor is complex! Written like this:
One can see that we do, indeed, have eight parameters on the state on which to encode our eight features.
By increasing the number of qubits and increasing the number of repetitions of entangling and rotation layers, one can encode much more data. Writing out the wave functions quickly becomes intractable. But we can still see the encoding in action.
Here we encode the data vector with 12 features, on a 3-qubit efficient_su2 circuit, using each of the parameterized gates to encode a different feature.
In this data vector, the features are shown in a particular order. In isolation, it doesn't matter if they are encoded in this order or in the reverse. What is important is keeping track of it and being consistent. Note in the circuit diagram that efficient_su2 assumes a certain ordering of encoding, specifically filling the first layer of parameterized gates from qubit 0 to qubit 2, and then moving to the next layer. This is neither consistent nor inconsistent with little-endian notation, since here the data features cannot be ordered by qubit a priori, before an encoding circuit has been specified.
x = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2]
circuit = efficient_su2(num_qubits=3, reps=1, insert_barriers=True)
encode = circuit.assign_parameters(x)
encode.decompose().draw(output="mpl")
Instead of increasing the number of qubits, you might choose to increase the number of repetitions of entangling and rotation layers. But there are limits to how many repetitions are useful. As previously stated, there is a tradeoff: circuits with more qubits or more repetitions of entangling and rotation layers may store more parameters, but do so with greater circuit depth. We will return to the depths of some built-in feature maps, below. The next few encoding methods that are built into Qiskit have "feature map" as part of their names. Let us reiterate that encoding data into a quantum circuit is a feature mapping, in the sense that it takes data into a new space: the Hilbert space of the qubits involved. The relationship between the dimensionality of the original feature space and that of the Hilbert space will depend on the circuit you use for encoding.
feature map
The feature map (ZFM) can be interpreted as a natural extension of phase encoding. The ZFM consists of alternating layers of single-qubit gates: Hadamard gate layers and phase gate layers. Let the data vector have features. The quantum circuit that performs the feature mapping is represented as a unitary operator that acts on the initial state:
where is the -qubit ground state. This notation is used for consistency with reference [4] Havlicek et al. The data features are mapped one-to-one with corresponding qubits. For example, if you have 8 features in a data vector, then you would use 8 qubits. The ZFM circuit is composed of repetitions of a subcircuit comprised of Hadamard gate layers and phase gate layers. A Hadamard layer is made up of a Hadamard gate acting on every qubit in an -qubit register, , within the same stage of the algorithm. This description also applies to a phase gate layer in which the qubit is acted on by . Each gate has one feature as an argument, but the phase gate layer ( is a function of the data vector. The full ZFM circuit unitary with a single repetition is: