Barren Plateaus Beyond Observable Concentration

Zi-Shen Li; Bujiao Wu; Xiao-Wei Li; Man-Hong Yung

Barren Plateaus Beyond Observable Concentration

Zi-Shen Li, Bujiao Wu, Xiao-Wei Li, Man-Hong Yung

Abstract

Parameterized quantum circuits (PQCs) are central to quantum machine learning and near-term quantum simulation, but their scalability is often hindered by barren plateaus (BPs), where gradients decay exponentially with system size. Prior explanations, including expressivity, entanglement, locality, and noise, are often presented in ways that conflate two distinct issues: concentration of the measured observable and loss of parameter sensitivity caused by circuit dynamics. We develop a unified statistical framework that separates these mechanisms. We show that several standard BP explanations, including locality- and entanglement-related effects, can be understood through a single phenomenon that we term observable concentration (OC). Importantly, we prove that avoiding OC is necessary but not sufficient for trainability. Beyond OC, we identify two distinct mid-circuit sources of gradient suppression. First, mid-circuit information loss occurs when parameter perturbations propagate into degrees of freedom that are inaccessible to the final measurement, yielding little or no response. Second, mid-circuit information scrambling occurs when local perturbations rapidly spread across the system and become effectively undetectable on the measured subsystem. We support our theory with explicit constructions and numerical evidence, including quantum convolutional neural network architectures that exhibit information-loss-induced barren plateaus despite the absence of observable concentration.

Barren Plateaus Beyond Observable Concentration

Abstract

Paper Structure (18 sections, 3 theorems, 63 equations, 5 figures)

This paper contains 18 sections, 3 theorems, 63 equations, 5 figures.

Introduction
Background and Preliminaries
BPs Induced by Information Loss
BPs Induced by Information Scrambling
Numerical Simulation
Conclusion and Discussion
Methods
DATA AVAILABILITY
CODE AVAILABILILTY
Acknowledgements
AUTHOR CONTRIBUTIONS
COMPETING INTERESTS

Key Result

Theorem 1

For a locally scrambling ensemble $\mathcal{E}$ and a state $\rho = U \rho_0 U^\dagger$ with $U \sim \mathcal{E}$, the second moment of the observable $g$ is bounded by: where $A = \mathrm{supp}(g)$ denotes the support of the Pauli operator $g$, $\rho_A$ is the reduced density matrix of $\rho$ on subsystem $A$, and $D_{\rm HS}(\rho_A):=\Vert{\rho_A-\mathds{1}/2^{|A|}}\Vert_{\rm F}$ denotes the Hi

Figures (5)

Figure 1: Demonstration of information loss and gradient vanishing. (a) depicts information loss in variational quantum circuits. (b) shows a simple illustration on how this information loss results in vanishing response. (c) shows different statistical behaviors of BPs with and without observable concentration (OC).
Figure 2: Circuit models to demonstrate mid-circuit response. $\mathcal{H}_{1},\mathcal{H}_2$ and $\mathcal{H}_3$ denote different Hilbert spaces with dimensions $d_1$, $d_2$, and $d_3$ respectively. The measurement is taken within $\mathcal{H}_2\otimes\mathcal{H}_3$.
Figure 3: Hierarchical tree circuit model (linear depth). This quantum circuit consists of $L$ layers of single-qubit and two-qubit parameterized gates. The qubits are indexed sequentially from bottom to top, denoted as $1,2,...,n$. Similarly, the layers are labeled consecutively from left to right, labeled as $1,2,...,L$.
Figure 4: Hierarchical tree-circuit trainability diagnostics via parameter-shift statistics. (a)-(f) present scatter plots of shifted terms in circuits containing $3,5,7,9,11$, and $13$ qubits, respectively. (g) illustrates how gradient variance vanishes with increasing number of qubits, with a gray dashed line representing the linear fit in logarithmic scale. (h) displays the comparison between $(1-r)$ and the observable variance, accompanied by a fitted straight line for $(1-r)$ in logarithmic scale as well.
Figure 5: Circuit models to demonstrate the impact of information scrambling on BPs. The measurement is applied within the joint space $\mathcal{H}_1\otimes\mathcal{H}_2$.

Theorems & Definitions (11)

definition 1: Barren plateaus
definition 2: Locally scrambling ensemble
Theorem 1: Observable Concentration from Local Scrambling
Theorem 2: Information loss and barren plateaus
Example 1
Corollary 3: Batch-decomposed form of \ref{['obs:mid-circ']}
Example 2
proof
proof
proof
...and 1 more

Barren Plateaus Beyond Observable Concentration

Abstract

Barren Plateaus Beyond Observable Concentration

Authors

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (11)