Backpropagation scaling in parameterised quantum circuits

Joseph Bowles; David Wierichs; Chae-Yeun Park

Backpropagation scaling in parameterised quantum circuits

Joseph Bowles, David Wierichs, Chae-Yeun Park

TL;DR

The paper addresses the costly gradient evaluation in parameterised quantum circuits by proposing structured circuit classes that enable backpropagation-like scaling. It introduces commuting-generator and commuting-block circuits, showing that gradients (and higher-order derivatives and the Fisher information) can be estimated with substantially fewer circuit evaluations, sometimes matching classical backpropagation in cost. Explicit constructions (X-generator and nonlocal-generator circuits) illustrate how to realize parallel gradient estimation and increased expressivity, while numerical experiments on a 16-qubit bars-and-dots task demonstrate order-of-magnitude reductions in required shots and competitive performance. The work highlights a path toward scalable quantum machine learning through carefully designed circuit architectures that leverage commutation relations and symmetry, with implications for training efficiency and practical quantum advantage. It also discusses limitations, simulability considerations, and future directions in exploring expressivity limits and broader applications.

Abstract

The discovery of the backpropagation algorithm ranks among one of the most important moments in the history of machine learning, and has made possible the training of large-scale neural networks through its ability to compute gradients at roughly the same computational cost as model evaluation. Despite its importance, a similar backpropagation-like scaling for gradient evaluation of parameterised quantum circuits has remained elusive. Currently, the most popular method requires sampling from a number of circuits that scales with the number of circuit parameters, making training of large-scale quantum circuits prohibitively expensive in practice. Here we address this problem by introducing a class of structured circuits that are not known to be classically simulable and admit gradient estimation with significantly fewer circuits. In the simplest case -- for which the parameters feed into commuting quantum gates -- these circuits allow for fast estimation of the gradient, higher order partial derivatives and the Fisher information matrix. Moreover, specific families of parameterised circuits exist for which the scaling of gradient estimation is in line with classical backpropagation, and can thus be trained at scale. In a toy classification problem on 16 qubits, such circuits show competitive performance with other methods, while reducing the training cost by about two orders of magnitude.

Backpropagation scaling in parameterised quantum circuits

TL;DR

Abstract

Paper Structure (27 sections, 6 theorems, 74 equations, 10 figures, 1 table)

This paper contains 27 sections, 6 theorems, 74 equations, 10 figures, 1 table.

Introduction
Backpropagation scaling in parameterised quantum circuits
Commuting-generator circuits
Higher order partial derivatives
Fisher information matrix
Simulability
Commuting-block circuits
Increased expressivity with commuting-block circuits
Increased expressivity with commuting-block circuits
Gradient scaling
Explicit constructions of commuting circuits
X-generator ansatz
Circuits with nonlocal generators
Numerical study: learning translationally invariant data
The learning problem: learning bars and dots
...and 12 more sections

Key Result

Theorem 1

Consider a commuting-generator circuit $C(\boldsymbol{\theta})$ of the form ansatz. Then an unbiased estimator of the gradient can be obtained by classically post-processing a single circuit $C'$ with the same number of qubits as $C$. As the measurement statistics of $C'$ are used to estimate all derivatives simultaneously, the variance of each derivative estimator scales as $\mathcal{O}(1/M)$, w

Figures (10)

Figure 1: (a) The simplest type of circuit class we consider, which we call commuting-generator circuits. An arbitrary unitary $V$ is applied, followed by a parameterised quantum circuit comprised of gates $\exp(-i\theta_j G_j)$ with mutually commuting generators $G_j$. An observable $\mathcal{H}$ is measured on some subset of the output qubits, and for each $G_j$ either (i) $\mathcal{H}$ commutes with $G_j$ or (ii) $\mathcal{H}$ anticommutes with $G_j$. (b) The standard method to estimate the gradient of quantum circuits is via parameter-shift rules. For gate generators with two distinct eigenvalues, the partial derivative with respect to each parameter is evaluated by estimating the difference between two circuits, where the parameter in question has been shifted. The resources required using this method can therefore be much larger than those required to estimate $C(\boldsymbol{\theta})$, which can be prohibitively expensive for circuits with thousands of parameters. (c) The corresponding circuit used for parallel gradient estimation. A unitary $D_{\text{odd}}$ (or $D_{\text{even}}$) is applied to the original circuit and the circuit is sampled from in the computational basis $M$ times. All odd (or even) partial derivatives can then be estimated to additive error $\epsilon = \mathcal{O}(\frac{1}{\sqrt{M}})$ by classically post processing the outcomes. In the case that all generators and the observable are stabiliser operators, the unitaries $D_{\text{even}}$, $D_{\text{odd}}$ are Clifford unitaries and the post-processing consists of evaluating expectation values of subsets of qubits only.
Figure 2: (a) The commuting-block circuit ansatz of Thm. \ref{['thm:blocks']}. The blocks $U_j(\boldsymbol{\theta}_j$) are commuting blocks with the same constraints as in Fig. \ref{['fig:mainfig']}. Generators between blocks have a fixed commutation relation: for any pair of blocks $U_j, U_k$ either (i) all generators from block $j$ commute with all generators from block $k$, or (ii) all generators from block $j$ anticommute with all generators from block $k$. (b) The circuit used to estimate the partial derivatives of those generators $G_j$ of block $b$ that anticommute with $\mathcal{H}$. Here $D$ is the unitary that diagonalises the operators $\{2iG_j\mathcal{H}\}$, and the unitaries $W_b$ and $\tilde{W}_b$ are defined in \ref{['Wbdef']} and \ref{['Wbtildedef']}. For generators $G_j$ that commute with $\mathcal{H}$, one replaces $W_b$ by $iW_b$ and $D$ diagonalises the operators $\{2G_j\mathcal{H}\}$.
Figure 3: An instance of an $X$-generator circuit for 5 qubits (left) and the corresponding circuit used for gradient estimation (right). The parameterised gates are generated by products of Pauli $X$ operators on different subsets of qubits and the observable is a product of Pauli $Z$ operators on a subset of qubits. To estimate the gradient, an additional circuit $D$ is added, which involves performing a controlled $Z$ gate between all qubits on which the observable acts non-trivially. Here $H_{xy}$ is the single-qubit Hadamard unitary that switches between the $X$ and $Y$ bases. To estimate the gradient of one of the gates (highlighted in yellow), one evaluates the expectation value of the generator at the output of the circuit. Since all generators are products of $X$, one can estimate the gradient in parallel by measuring each qubit in the $X$ basis.
Figure 4: A commuting-generator circuit family with nonlocal generators and Hamiltonian (left). The similarity in structure between the Hamiltonian and the generators allows to measure all gradient entries in parallel without an additional basis change, simply by measuring in the computational basis and post-processing the samples into expectation values of products of Pauli $Z$ operators (right).
Figure 5: (a) We generate two types of 16-dimensional vectors corresponding to either bars (label 1) or dots (label 2). Independent Gaussian noise is added to generate the input data for the classification task. Note that the labels of the data set are invariant to translations of the elements of the vectors. (b--c) The two translation equivariant models we benchmark for this problem. Here '+ sym' denotes symmetrisation of the generator over 1D translations, e.g. $X_1$ + sym $=X_1+X_2\cdots +X_n$. The model in (b) features $X$ generators only, and can therefore exploit the parallel gradient evaluation described in Sec. \ref{['sec:xgen']}. The model in (c) is a translation equivariant version of the model in schatzki2022theoretical and features non-commuting generators.
...and 5 more figures

Theorems & Definitions (7)

Definition 2.1: backpropagation scaling
Theorem 1
Corollary 1
Theorem 2
Corollary 2
Corollary 3
Theorem 3

Backpropagation scaling in parameterised quantum circuits

TL;DR

Abstract

Backpropagation scaling in parameterised quantum circuits

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (7)