On quantum backpropagation, information reuse, and cheating measurement collapse

Amira Abbas; Robbie King; Hsin-Yuan Huang; William J. Huggins; Ramis Movassagh; Dar Gilboa; Jarrod R. McClean

On quantum backpropagation, information reuse, and cheating measurement collapse

Amira Abbas, Robbie King, Hsin-Yuan Huang, William J. Huggins, Ramis Movassagh, Dar Gilboa, Jarrod R. McClean

TL;DR

This work analyzes whether parameterized quantum models can achieve backpropagation-like training efficiency. It introduces an online shadow tomography framework to reuse information across layers, showing that single-copy access cannot yield backpropagation scaling in general, while multi-copy access with gentle measurements can approach the classical gradient scaling in quantum resources, albeit with potentially exponential classical overhead. A concrete quantum-efficient protocol reduces quantum gradient costs to $O(M\,\mathrm{polylog}(M))$ operations at the expense of storing a hypothesis state, and its connection to shadow tomography demonstrates fundamental limits tied to efficiently learning observables. The paper also proves that fully gentle strategies alone cannot suffice due to Grover-type bounds, and discusses approximate schemes (e.g., tensor networks) as practical avenues. Overall, it clarifies the nuanced landscape of training large quantum models and motivates targeted architectural or approximation-based approaches for scalable quantum learning.

Abstract

The success of modern deep learning hinges on the ability to train neural networks at scale. Through clever reuse of intermediate information, backpropagation facilitates training through gradient computation at a total cost roughly proportional to running the function, rather than incurring an additional factor proportional to the number of parameters - which can now be in the trillions. Naively, one expects that quantum measurement collapse entirely rules out the reuse of quantum information as in backpropagation. But recent developments in shadow tomography, which assumes access to multiple copies of a quantum state, have challenged that notion. Here, we investigate whether parameterized quantum models can train as efficiently as classical neural networks. We show that achieving backpropagation scaling is impossible without access to multiple copies of a state. With this added ability, we introduce an algorithm with foundations in shadow tomography that matches backpropagation scaling in quantum resources while reducing classical auxiliary computational costs to open problems in shadow tomography. These results highlight the nuance of reusing quantum information for practical purposes and clarify the unique difficulties in training large quantum models, which could alter the course of quantum machine learning.

On quantum backpropagation, information reuse, and cheating measurement collapse

TL;DR

operations at the expense of storing a hypothesis state, and its connection to shadow tomography demonstrates fundamental limits tied to efficiently learning observables. The paper also proves that fully gentle strategies alone cannot suffice due to Grover-type bounds, and discusses approximate schemes (e.g., tensor networks) as practical avenues. Overall, it clarifies the nuanced landscape of training large quantum models and motivates targeted architectural or approximation-based approaches for scalable quantum learning.

Abstract

Paper Structure (32 sections, 18 theorems, 58 equations, 2 figures, 3 algorithms)

This paper contains 32 sections, 18 theorems, 58 equations, 2 figures, 3 algorithms.

Introduction
Backpropagation scaling
Variational quantum models
Learning algorithms without quantum memory
Reusing multiple copies through gentle measurement
A quantum-efficient protocol for backpropagation
Reduction to shadow tomography
A fully gentle gradient strategy
Approximate schemes
Discussion
Resource scaling for quantum backpropagation methods
Memory complexity of the function
Current gradient methods
Naive sampling
Fast gradient algorithm
...and 17 more sections

Key Result

Proposition 3

Given the quantum data setting where one seeks to train a variational model using copies of the unknown state $\rho$ and the additional constraint of no quantum memory, then backpropagation scaling is not possible in the general case.

Figures (2)

Figure 1: Quantum backpropagation algorithm. Our proposal for quantum backpropagation consists of an online shadow tomography protocol, coupled with a threshold search procedure aaronson2018onlinebuadescu2021improved. The algorithm is executed in batches of size $O(\mathrm{polylog}(M))$, of which roughly $n$ batches are needed, where $\rho$ is an $n$ qubit quantum state. A classically constructed hypothesis state $\sigma$ is also necessary for the algorithm. Crucially, quantum states and the hypothesis state are rotated before each threshold check, to rotate through the layers of a quantum neural network $F(\theta),\ \theta \in \mathbb{R}^M$ and reuse information for gradients. This enables a cost reduction from $O(M^2\cdot \mathrm{polylog}{M})$ to $O(M \cdot \mathrm{polylog}(M))$ to compute the full gradient. For convenience, we suppress precision factors which scale as $O(1/\varepsilon^4)$ for this proposal.
Figure 2: Quantum backpropagation scaling. The parameter-shift rule is plotted alongside true quantum backpropagation scaling. On the $x$-axis is time in number of seconds required to compute a single estimate of the gradient in log scale, with common time points stated explicitly. On the $y$-axis is the number of parameters, also in log scale, that may be optimized using each method, for a given amount of time. We make simple assumptions, motivated from the work in babbush2021focus. Namely, we assume a minimum system size of $n = 100$ qubits. Further, assuming a favourable time of $10 \mu s$ to compute one parameterised operation ( which is $1$ order of magnitude less than the time to compute one Toffoli gate), the time for one primitive is lower bounded by $100\times 10\mu s = T_q$. Scaling in time is then roughly $M^2 \cdot T_q$ for the parameter-shift rule and $M\cdot \mathrm{polylog}(M) \cdot T_q$ for quantum backpropagation. Furthermore, $\varepsilon = O(1)$.

Theorems & Definitions (38)

Definition 1: Backpropagation scaling
Definition 2: Simple variational model
Proposition 3: Backpropagation scaling is impossible for quantum data using single copies
proof
Remark 4: Current gradient methods fail to achieve backpropagation scaling
Proposition 5: Classical analogue achieves backpropagation scaling
Definition 6: Gentle measurement
Proposition 7: A special case variational model achieves backpropagation scaling
proof
Definition 8: Quantum neural network
...and 28 more

On quantum backpropagation, information reuse, and cheating measurement collapse

TL;DR

Abstract

On quantum backpropagation, information reuse, and cheating measurement collapse

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (38)