Table of Contents
Fetching ...

Almost fault-tolerant quantum machine learning with drastic overhead reduction

Haiyue Kang, Younghun Kim, Eromanga Adermann, Martin Sevior, Muhammad Usman

TL;DR

The paper tackles the challenge of trainable quantum machine learning under realistic hardware noise and the prohibitive resource costs of full quantum error correction due to magic-state distillation. It introduces partial quantum error correction (QEC) that corrects Clifford gates while leaving single-qubit gates uncorrected, yielding dramatic spacetime overhead reductions. Through MNIST-based quantum variational classifiers (QVCs), it demonstrates trainability under depolarizing single-qubit noise up to $p\approx 1.47\times10^{-3}$ (net gate error about $1.96\times10^{-3}$) and shows robustness to phase-damping noise with mean over-rotation and potential benefits from thermal damping. The results indicate that partial QEC can achieve near-term, high-accuracy QML with orders-of-magnitude lower overhead than distillation-based fault tolerance, offering a practical pathway for noisy-device quantum learning.

Abstract

Errors in the current generation of quantum processors pose a significant challenge towards practical-scale implementations of quantum machine learning (QML) as they lead to trainability issues arising from noise-induced barren plateaus, as well as performance degradations due to the noise accumulation in deep circuits even when QML models are free from barren plateaus. Quantum error correction (QEC) protocols are being developed to overcome hardware noise, but their extremely high spacetime overheads, mainly due to magic state distillation, make them infeasible for near-term practical implementation. This work proposes the idea of partial quantum error correction (QEC) for quantum machine learning (QML) models and identifies a sweet spot where distillations are omitted to significantly reduce overhead. By assuming error-corrected two-qubit Controlled-$Z$s (Clifford operations), we demonstrate that the QML models remain trainable even when single-qubit gates are subjected to $\approx0.2\%$ depolarizing noise, corresponding to a gate error rate of $\approx0.13\%$ under randomized benchmarking. Further analysis based on various noise models, such as phase-damping and thermal-dissipation channels at low temperature, indicates that the QML models are trainable independent of the mean angle of over-rotation, or can even be improved by thermal damping that purifies a quantum state away from depolarizations. While it may take several years to build quantum processors capable of fully fault-tolerant QML, our work proposes a resource-efficient solution for trainable and high-accuracy QML implementations in noisy environments.

Almost fault-tolerant quantum machine learning with drastic overhead reduction

TL;DR

The paper tackles the challenge of trainable quantum machine learning under realistic hardware noise and the prohibitive resource costs of full quantum error correction due to magic-state distillation. It introduces partial quantum error correction (QEC) that corrects Clifford gates while leaving single-qubit gates uncorrected, yielding dramatic spacetime overhead reductions. Through MNIST-based quantum variational classifiers (QVCs), it demonstrates trainability under depolarizing single-qubit noise up to (net gate error about ) and shows robustness to phase-damping noise with mean over-rotation and potential benefits from thermal damping. The results indicate that partial QEC can achieve near-term, high-accuracy QML with orders-of-magnitude lower overhead than distillation-based fault tolerance, offering a practical pathway for noisy-device quantum learning.

Abstract

Errors in the current generation of quantum processors pose a significant challenge towards practical-scale implementations of quantum machine learning (QML) as they lead to trainability issues arising from noise-induced barren plateaus, as well as performance degradations due to the noise accumulation in deep circuits even when QML models are free from barren plateaus. Quantum error correction (QEC) protocols are being developed to overcome hardware noise, but their extremely high spacetime overheads, mainly due to magic state distillation, make them infeasible for near-term practical implementation. This work proposes the idea of partial quantum error correction (QEC) for quantum machine learning (QML) models and identifies a sweet spot where distillations are omitted to significantly reduce overhead. By assuming error-corrected two-qubit Controlled-s (Clifford operations), we demonstrate that the QML models remain trainable even when single-qubit gates are subjected to depolarizing noise, corresponding to a gate error rate of under randomized benchmarking. Further analysis based on various noise models, such as phase-damping and thermal-dissipation channels at low temperature, indicates that the QML models are trainable independent of the mean angle of over-rotation, or can even be improved by thermal damping that purifies a quantum state away from depolarizations. While it may take several years to build quantum processors capable of fully fault-tolerant QML, our work proposes a resource-efficient solution for trainable and high-accuracy QML implementations in noisy environments.

Paper Structure

This paper contains 15 sections, 45 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: (a) An illustration of the major obstacles of a quantum variational classifier on different scales of circuit depth and number of qubits: The model becomes non-trainable due to the scaling in the number of qubits, as denoted by barren plateaus in the orange-shaded region. When the circuit depth scales faster than $O(\log(n))$, the quantum circuit is easily overwhelmed by the noise, leading to noise-induced barren plateaus as denoted by the blue-shaded region. In contrast, the circuit can be efficiently simulated classically if the circuit depth scales more slowly than $O(\log(n))$. If only a few qubits are involved, it also lacks expressibility for encoding the state to demonstrate the utility of quantum computation. (b) Example of a well-behaved landscape of a loss function for a trainable quantum neural network, where the function values and parameter values are represented on the vertical and horizontal axes, respectively. (c) The typical landscape for conventional barren plateaus without any noise, where the gradients are vanishing almost everywhere except some small regions of well-behaved parameters, typically around the minima BP_narrow_gorgesbarren_plateausbarren_plateaus_summary. (d) The landscape for noise-induced barren plateaus, where there are no exceptions to gradients that are non-vanishing.
  • Figure 2: (a) Demonstration of how logical operators are implemented on patch-based surface codes introduced in Ref. surface_codes. White dots indicate data qubits, black dots indicate syndrome extraction qubits with orange and blue strips representing Pauli-$X$ and $Z$ stabilizers, respectively. For operators induced from the Clifford group, including $X$, $Z$ and $CX$ gates, their logical operators can be encoded from many physical operators directly, without the need for ancilla qubits surface_codesqec_lattice_surgery. For operators outside the Clifford group, such as the $T$ gate, its logical operator $T_L$ cannot be implemented directly, but must be teleported from an ancilla logical qubit in the state $\ket{T_L}=\frac{1}{\sqrt{2}}(\ket{0_L}+e^{i\pi/4}\ket{1_L})$ via magic state injection. It turns out that $\ket{T_L}$ must be prepared from a single, physical qubit $\ket{T}$ state first, and then perform stabilizer measurements qec_lattice_surgery. Or, one could choose to carry out magic state distillation with very high spacetime cost. (b) Without the redundancy of encoding one logical operator from multiple physical operators, the logical error rate for $T$ gates is comparable to the physical $T$ error rate. However, if the state is distilled properly, the logical gate error rate can be suppressed to the same level as other Clifford gates.
  • Figure 3: (a) General workflow of a quantum neural network for classification tasks. 1. One first prepares a set of data-label pairs for training, after which 2. the data (e.g. an array of pixel values of an image) is encoded onto a quantum state $\ket{\psi(\bm{x}_i)}=\mathcal{C}(\bm{x}_i)\ket{0}^{\otimes n}$. The encoded state evolves through a sequence of parameterized unitaries followed by measurements on each qubit, and the observable expectation values correspond to the predicted probabilities of the respective label. Consequently, the predicted label is given by the qubit index with the highest expected value. 3. The predicted probabilities are compared with the true probabilities obtained from the actual label through the cross-entropy loss function, which quantitatively evaluates the inference performance. 4. The parameters are updated iteratively using the gradient descent method. (b) The detailed circuit design of the variational circuit presented in this paper. The data is encoded into the circuit through amplitude encoding to minimize the requirement for the number of qubits and conserve memory for simulation. The parameterized unitary consists of multiple layers of unitaries, with each layer containing a sequence of single-qubit rotations with parameterized rotation axes and angles, denoted as $R_{lm}$ as shorthand for $R_{m {n}_{lm}}(\theta_{lm})$, followed by a sequence of entangling Controlled-$Z$ gates without trainable parameters. An error channel $\mathcal{E}$ is added after every ideal gate $R_{lm}$, constitutes $\Tilde {R}_{lm}$. The inferred probabilities for each potential label are evaluated by measuring the Pauli-$Z$ expected values on each of the qubits.
  • Figure 4: The performance of the QVC in MNIST number classification problems with (a) 75 quantum layers but without a classical layer, and (b) 75 layers with a fully-connected classical layer and noisy two-qubit gate. The classification success rates inferred from the test datasets are plotted against the total number of images trained. The noise-free simulation is highlighted with black, thickened lines to contrast with noisy ones. Since we employ amplitude encoding, the number of pixels per image is exactly $2^n$, which corresponds to the dimension of $\sqrt{2^{n}}\cross \sqrt{2^{n}}$, where $n$ is the number of qubits for the variational circuit. The legend denotes the strengths of the depolarizing channel from 0 to $5.11\cross 10^{-3}$. (c) The corresponding cost function values (left axis) and their average gradients squared (right axis), $\mathbb{E}_k\left((\nabla_{\theta_k}\mathcal{L})^2\right)$, of the trainable parameters (classical and quantum), where the arrows indicate the axis correspondence of the plots. A clear trend of flattening loss landscapes and vanishing gradients can be observed as the depolarizing strength increases. When the gradients drop to a scale around $10^{-8}$, i.e. $p_{\text{depol}}=5.11\times10^{-3}$, due to shot noise, the model becomes distinctively not trainable. (d) For the noisy two-qubit gate plot where the classical layer is included, the averaged cost function gradients with respect to all parameters, including both the quantum and classical layers, actually increase after being trained with more images. In contrast, the classical layer gradients by themselves are behaving as expected, as shown in the zoomed figure.
  • Figure 5: The gradients squared averaged over all trainable parameters and all iterations in the training versus the depolarizing strength $p$ in logarithmic scale. Error bar takes the standard deviation of the mean among all iterations.
  • ...and 6 more figures