Table of Contents
Fetching ...

Barren Plateaus Beyond Observable Concentration

Zi-Shen Li, Bujiao Wu, Xiao-Wei Li, Man-Hong Yung

Abstract

Parameterized quantum circuits (PQCs) are central to quantum machine learning and near-term quantum simulation, but their scalability is often hindered by barren plateaus (BPs), where gradients decay exponentially with system size. Prior explanations, including expressivity, entanglement, locality, and noise, are often presented in ways that conflate two distinct issues: concentration of the measured observable and loss of parameter sensitivity caused by circuit dynamics. We develop a unified statistical framework that separates these mechanisms. We show that several standard BP explanations, including locality- and entanglement-related effects, can be understood through a single phenomenon that we term observable concentration (OC). Importantly, we prove that avoiding OC is necessary but not sufficient for trainability. Beyond OC, we identify two distinct mid-circuit sources of gradient suppression. First, mid-circuit information loss occurs when parameter perturbations propagate into degrees of freedom that are inaccessible to the final measurement, yielding little or no response. Second, mid-circuit information scrambling occurs when local perturbations rapidly spread across the system and become effectively undetectable on the measured subsystem. We support our theory with explicit constructions and numerical evidence, including quantum convolutional neural network architectures that exhibit information-loss-induced barren plateaus despite the absence of observable concentration.

Barren Plateaus Beyond Observable Concentration

Abstract

Parameterized quantum circuits (PQCs) are central to quantum machine learning and near-term quantum simulation, but their scalability is often hindered by barren plateaus (BPs), where gradients decay exponentially with system size. Prior explanations, including expressivity, entanglement, locality, and noise, are often presented in ways that conflate two distinct issues: concentration of the measured observable and loss of parameter sensitivity caused by circuit dynamics. We develop a unified statistical framework that separates these mechanisms. We show that several standard BP explanations, including locality- and entanglement-related effects, can be understood through a single phenomenon that we term observable concentration (OC). Importantly, we prove that avoiding OC is necessary but not sufficient for trainability. Beyond OC, we identify two distinct mid-circuit sources of gradient suppression. First, mid-circuit information loss occurs when parameter perturbations propagate into degrees of freedom that are inaccessible to the final measurement, yielding little or no response. Second, mid-circuit information scrambling occurs when local perturbations rapidly spread across the system and become effectively undetectable on the measured subsystem. We support our theory with explicit constructions and numerical evidence, including quantum convolutional neural network architectures that exhibit information-loss-induced barren plateaus despite the absence of observable concentration.
Paper Structure (18 sections, 3 theorems, 63 equations, 5 figures)

This paper contains 18 sections, 3 theorems, 63 equations, 5 figures.

Key Result

Theorem 1

For a locally scrambling ensemble $\mathcal{E}$ and a state $\rho = U \rho_0 U^\dagger$ with $U \sim \mathcal{E}$, the second moment of the observable $g$ is bounded by: where $A = \mathrm{supp}(g)$ denotes the support of the Pauli operator $g$, $\rho_A$ is the reduced density matrix of $\rho$ on subsystem $A$, and $D_{\rm HS}(\rho_A):=\Vert{\rho_A-\mathds{1}/2^{|A|}}\Vert_{\rm F}$ denotes the Hi

Figures (5)

  • Figure 1: Demonstration of information loss and gradient vanishing. (a) depicts information loss in variational quantum circuits. (b) shows a simple illustration on how this information loss results in vanishing response. (c) shows different statistical behaviors of BPs with and without observable concentration (OC).
  • Figure 2: Circuit models to demonstrate mid-circuit response. $\mathcal{H}_{1},\mathcal{H}_2$ and $\mathcal{H}_3$ denote different Hilbert spaces with dimensions $d_1$, $d_2$, and $d_3$ respectively. The measurement is taken within $\mathcal{H}_2\otimes\mathcal{H}_3$.
  • Figure 3: Hierarchical tree circuit model (linear depth). This quantum circuit consists of $L$ layers of single-qubit and two-qubit parameterized gates. The qubits are indexed sequentially from bottom to top, denoted as $1,2,...,n$. Similarly, the layers are labeled consecutively from left to right, labeled as $1,2,...,L$.
  • Figure 4: Hierarchical tree-circuit trainability diagnostics via parameter-shift statistics. (a)-(f) present scatter plots of shifted terms in circuits containing $3,5,7,9,11$, and $13$ qubits, respectively. (g) illustrates how gradient variance vanishes with increasing number of qubits, with a gray dashed line representing the linear fit in logarithmic scale. (h) displays the comparison between $(1-r)$ and the observable variance, accompanied by a fitted straight line for $(1-r)$ in logarithmic scale as well.
  • Figure 5: Circuit models to demonstrate the impact of information scrambling on BPs. The measurement is applied within the joint space $\mathcal{H}_1\otimes\mathcal{H}_2$.

Theorems & Definitions (11)

  • definition 1: Barren plateaus
  • definition 2: Locally scrambling ensemble
  • Theorem 1: Observable Concentration from Local Scrambling
  • Theorem 2: Information loss and barren plateaus
  • Example 1
  • Corollary 3: Batch-decomposed form of \ref{['obs:mid-circ']}
  • Example 2
  • proof
  • proof
  • proof
  • ...and 1 more