Table of Contents
Fetching ...

A unifying account of warm start guarantees for patches of quantum landscapes

Hela Mhiri, Ricard Puig, Sacha Lerch, Manuel S. Rudolph, Thiparat Chotibut, Supanut Thanasilp, Zoë Holmes

TL;DR

The paper addresses warm-start guarantees for patches of quantum loss landscapes in variational quantum algorithms, focusing on regions around points with non-vanishing curvature. It develops a unifying variance-lower-bound framework that encompasses prior patch-based results and extends to problem-inspired ansätze, showing patches with non-exponentially small variance of radius $r_{\text{patch}} \in \Theta\left(\frac{1}{\sqrt{m}\,\mathrm{poly}(n)}\right)$. The key insights connect curvature and loss dynamics to Fourier frequencies, with patch width governed by maximal and effective frequencies; correlated parameters tend to shrink the region of attraction. Numerical results support the existence of fertile valleys but indicate that barren-plateau behavior tends to persist within constant-width subregions, implying that scalable warm-start strategies will require increasingly precise initializations as system size grows. The work also provides upper bounds that link full-landscape barren plateaus to broad patches, highlighting both the promise and the limits of warm-starting in variational quantum computing.

Abstract

Barren plateaus are fundamentally a statement about quantum loss landscapes on average but there can, and generally will, exist patches of barren plateau landscapes with substantial gradients. Previous work has studied certain classes of parameterized quantum circuits and found example regions where gradients vanish at worst polynomially in system size. Here we present a general bound that unifies all these previous cases and that can tackle physically-motivated ansätze that could not be analyzed previously. Concretely, we analytically prove a lower-bound on the variance of the loss that can be used to show that in a non-exponentially narrow region around a point with curvature the loss variance cannot decay exponentially fast. This result is complemented by numerics and an upper-bound that suggest that any loss function with a barren plateau will have exponentially vanishing gradients in any constant radius subregion. Our work thus suggests that while there are hopes to be able to warm-start variational quantum algorithms, any initialization strategy that cannot get increasingly close to the region of attraction with increasing problem size is likely inadequate.

A unifying account of warm start guarantees for patches of quantum landscapes

TL;DR

The paper addresses warm-start guarantees for patches of quantum loss landscapes in variational quantum algorithms, focusing on regions around points with non-vanishing curvature. It develops a unifying variance-lower-bound framework that encompasses prior patch-based results and extends to problem-inspired ansätze, showing patches with non-exponentially small variance of radius . The key insights connect curvature and loss dynamics to Fourier frequencies, with patch width governed by maximal and effective frequencies; correlated parameters tend to shrink the region of attraction. Numerical results support the existence of fertile valleys but indicate that barren-plateau behavior tends to persist within constant-width subregions, implying that scalable warm-start strategies will require increasingly precise initializations as system size grows. The work also provides upper bounds that link full-landscape barren plateaus to broad patches, highlighting both the promise and the limits of warm-starting in variational quantum computing.

Abstract

Barren plateaus are fundamentally a statement about quantum loss landscapes on average but there can, and generally will, exist patches of barren plateau landscapes with substantial gradients. Previous work has studied certain classes of parameterized quantum circuits and found example regions where gradients vanish at worst polynomially in system size. Here we present a general bound that unifies all these previous cases and that can tackle physically-motivated ansätze that could not be analyzed previously. Concretely, we analytically prove a lower-bound on the variance of the loss that can be used to show that in a non-exponentially narrow region around a point with curvature the loss variance cannot decay exponentially fast. This result is complemented by numerics and an upper-bound that suggest that any loss function with a barren plateau will have exponentially vanishing gradients in any constant radius subregion. Our work thus suggests that while there are hopes to be able to warm-start variational quantum algorithms, any initialization strategy that cannot get increasingly close to the region of attraction with increasing problem size is likely inadequate.

Paper Structure

This paper contains 55 sections, 22 theorems, 423 equations, 6 figures, 5 tables.

Key Result

Theorem 1

Consider a generic loss $\mathcal{L}(\boldsymbol{\theta})$ of the form in Eq. eq:loss and a parametrized quantum circuit $U(\boldsymbol{\theta})$ of the form in Eq. eq:circuit. We consider uniformly sampling parameters in a hypercube of width $2r$ around any point of the landscape $\boldsymbol{\phi} we can find a region with $r_{\rm patch}$ where, such that $\forall\, r \leqslant r_{\rm patch}$

Figures (6)

  • Figure 1: Schematic of different types of correlations. Spatial correlations, where we correlate gates in different layers, are shown in green. These can result from a gate with a generator that acts on multiple qubits with one variational parameter. Time correlations between parameters in different layers are shown in blue. The function $\mathcal{S}$ maps a gate label to a corresponding parameter label.
  • Figure 2: Schematic summary of our main results. In panel (a), the solid blue curve sketches a generic loss landscape $\mathcal{L}(\boldsymbol{\theta})$ and the horizontal line above its second order derivative $\mathcal{L}^{(2)}_i(\boldsymbol{\theta})$. The red sections indicate vanishing second derivatives and the green sections represent curvature-rich regions with non-negligible second derivatives which can, for example, occur near a global minimum $\boldsymbol{\theta}^*$, around identity $\mathbf{0}$ or simply around arbitrary local minima $\boldsymbol{\phi}$. Panel (b) shows the variance of $\mathcal{L}(\boldsymbol{\theta})$ for uniformly sampled parameters $\boldsymbol{\theta}$ in a hypercube of width $2r$ centered around any point with a non-negligible curvature. On the full landscape $(r = r_{\mathrm{full}})$, prior works (e.g., Ref. cerezo2020cost) show that certain families of circuits exhibit a barren plateau. Our Proposition \ref{['prop:upperbound']}, which strictly only applies to a restricted family of circuits, formalizes that any subregion with the patch's size $r \in (c' r_{\rm full}, r_{\rm full})$ with some constant $c' < 1$ still inherits an exponentially vanishing loss variance over that region. By contrast, the main Theorem \ref{['th:var']} states that for polynomial depth circuits, even if $r$ shrinks no faster than $1/\mathrm{poly}(n)$, the corresponding patch (green) still supports non-exponentially vanishing variance.
  • Figure 3: Role of Fourier frequencies. Here we sketch the Fourier decomposition of the loss $\mathcal{L}(\boldsymbol{\theta})$ with respect to a parameter $\theta_j$. Intuitively, the width of patches with gradients depend inversely on the magnitude of the frequencies in the Fourier decomposition. In cases where many high frequencies are present both the maximum and effective (dominant) frequencies are high and the minima tend to be narrower compared to when those frequencies are low. Note that this figure is merely to be understood as an illustration of the frequencies and the role they play in the loss function.
  • Figure 4: Example circuit architectures schematic. Here we sketch the structure of the four families of circuits we analyze: a) a tensor product ansatz, b) a hardware efficient ansatz (HEA), c) the Hamiltonian variational ansatz (HVA) and d) the unitary coupled cluster (UCC) ansatz.
  • Figure 5: Patch variance for correlated and uncorrelated product ansatz. Here we study the landscape of a loss function of the form in Eq. \ref{['eq:loss']} with $\rho = |\psi \rangle \langle \psi |$ where $\ket{\psi} = \frac{1}{\sqrt{2}}(\ket{+}^{\otimes n}+ \ket{-}^{\otimes n})$ and $O = \bigotimes_{i=1}^{n} \sigma_{z}^{(i)}$. We consider a tensor product ansatz composed of $R_X(\theta ),R_Z(\theta ), R_X(\theta )$ rotations applied on each qubit. We plot the relationship between the variance of $\mathcal{L}(\boldsymbol{\theta})$ in a hypercube around $\boldsymbol{0}$ as a function of $r$ when the parameters are correlated (green) and uncorrelated (blue). The max variance, $\text{Var}_{\rm max}$, and its location, $r_{\rm max}$, are indicated for $n = 18$ in the correlated case.
  • ...and 1 more figures

Theorems & Definitions (40)

  • Theorem 1: Lower bound on the loss variance, Informal
  • Corollary 1: Scaling of regions of attraction, Informal
  • Proposition 1: Upper bound on the variance
  • Theorem 2: Taylor remainder theorem for a single variable real function
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Proposition 2
  • proof
  • ...and 30 more