Variational quantum simulation: a case study for understanding warm starts

Ricard Puig; Marc Drudis; Supanut Thanasilp; Zoë Holmes

Variational quantum simulation: a case study for understanding warm starts

Ricard Puig, Marc Drudis, Supanut Thanasilp, Zoë Holmes

TL;DR

This work investigates warm-start strategies for variational quantum simulations by case-studying iterative circuit compression of real-time evolution. It develops analytic bounds showing that, around a previous iteration’s solution, the loss variance decays only polynomially with system size in a width scaling as 1/√M, and it proves regions of approximate convexity in which trainability is enhanced for polynomially large time steps δt. It then formalizes the notion of an adiabatic minimum and proves conditions under which this minimum remains within trainable regions across iterations, while also discussing the possibility of minima jumps and the existence of fertile valleys that permit training despite global barren plateaus. The results generalize to other iterative fidelity-based losses, including imaginary-time evolution and unitary learning, and offer a framework for understanding and extending warm-start strategies in iterative quantum algorithms. Overall, the work clarifies the nuanced landscape of trainability in warm-start variational algorithms and provides principled guidance on step sizes, region sizes, and the potential to leverage intermediate valleys for efficient optimization.

Abstract

The barren plateau phenomenon, characterized by loss gradients that vanish exponentially with system size, poses a challenge to scaling variational quantum algorithms. Here we explore the potential of warm starts, whereby one initializes closer to a solution in the hope of enjoying larger loss variances. Focusing on an iterative variational method for learning shorter-depth circuits for quantum real time evolution we conduct a case study to elucidate the potential and limitations of warm starts. We start by proving that the iterative variational algorithm will exhibit substantial (at worst vanishing polynomially in system size) gradients in a small region around the initializations at each time-step. Convexity guarantees for these regions are then established, suggesting trainability for polynomial size time-steps. However, our study highlights scenarios where a good minimum shifts outside the region with trainability guarantees. Our analysis leaves open the question whether such minima jumps necessitate optimization across barren plateau landscapes or whether there exist gradient flows, i.e., fertile valleys away from the plateau with substantial gradients, that allow for training. While our main focus is on this case study of variational quantum simulation, we end by discussing how our results work in other iterative settings.

Variational quantum simulation: a case study for understanding warm starts

TL;DR

Abstract

Paper Structure (41 sections, 23 theorems, 229 equations, 8 figures)

This paper contains 41 sections, 23 theorems, 229 equations, 8 figures.

Introduction
Preliminaries
Iterative Variational Time-evolution Compression
Gradient magnitudes and barren plateaus
Main Results
Overview of analysis
Lower-bound on the variance
Convexity region around the starting point
Adiabatic minimum
Minima jumps and fertile valleys
Outlook on other iterative approaches
Discussion
Acknowledgments
Code Availability
Preliminaries
...and 26 more sections

Key Result

Theorem 1

Consider the general ansatz in Eq. eq:circuit and assume that in the first iteration the system is prepared in an initial state $\rho_0$ and let us choose $\sigma_1$ such that $\Tr[\rho_0 \sigma_1 \rho_0 \sigma_1] = 0$. Given that the time-step is bounded as where $\lambda_{\rm max}$ is the largest eigenvalue of $H$ and we consider uniformly sampling parameters in a hypercube of width $2r$ around

Figures (8)

Figure 1: a) Each iteration of the variational compression scheme consists of four steps. Starting from the top: (i) apply the circuit with the last set of parameters $\bm{\theta}^{*}$ to the initial state, (ii) apply $e^{-iH\delta t}$ for a small time-step $\delta t$, (iii) train the circuit initialising your parameters around the previous ones, (iv) update the parameters. b) We sketch a typical representation of a loss function $\mathcal{L}(\boldsymbol{\theta})$ with a barren plateau across the full landscape (Region I). In Theorem \ref{['thm:variance-lower-bound']} we prove that in a hypercube of width $2r$ with $r \in \Theta\left(\frac{1}{\sqrt{M}}\right)$ (sketched as Region II) the variance of the loss is only polynomially vanishing in system size $n$. In Theorem \ref{['thm:convex']} we prove that in a smaller hypercube (highlighted as the blue region) the landscape is approximately convex. Similar results for other iterative approaches with a fidelity-based loss are discussed in Section \ref{['sec:extension-to-other-iterative']} with technical details in Appendix \ref{['app:extension']}. Illustrative examples include preparing a ground state via imaginary time evolution as shown in Appendix \ref{['app:imaginary']}, as well as learning an entire unknown unitary via a variational approach (as in Appendix \ref{['app:extension-variational-u']}) or a machine learning approach (as in Appendix \ref{['app:extension-qml-u']})
Figure 2: Variance of landscape and width of narrow gorge. Here we study the landscape of $\mathcal{L}(\boldsymbol{\theta})$, for the first time-step of the variational time-evolution compression algorithm, for different system sizes $n$ as a function of the width of the hypercube $r$. We consider a hardware efficient ansatz with $n$ layers and random initial parameters within the hypercube. a) We plot $\mathcal{L}(\boldsymbol{\theta})$ and its variance ${\rm Var}_{\boldsymbol{\theta}\sim\boldsymbol{\mathcal{D}}(\boldsymbol{0}, r)}[\mathcal{L}(\boldsymbol{\theta})]$ as function of $r/\pi$. Since the shape of the landscape depends on the direction of the parameter update, to plot $\mathcal{L}(\boldsymbol{\theta})$ we have taken the average over 500 different directions. For ${\rm Var}_{\boldsymbol{\theta}\sim\boldsymbol{\mathcal{D}}(\boldsymbol{0}, r)}[\mathcal{L}(\boldsymbol{\theta})]$, we keep track of its maximum value (marked with a vertical line) for each system size. b) The value $r_{\rm max}$ for which the variance peaks as function of the number of parameters in the ansatz. c) Maximum value of the variance for different system sizes. While the results shown here are for the first iteration of the variational compression scheme very similar results are observed at later iterations (in line with Theorem \ref{['thm:variance-lower-bound']}).
Figure 3: Routine evolution of the adiabatic minimum. Here we study the landscape of $\mathcal{L}(\boldsymbol{\theta})$ as we increase the time-step $\delta t$. We study a 10 qubit Hamiltonian with nearest-neighbour interactions on a 1D lattice with $H=\sum X_i Z_{i+1} - 0.95 \sum Y_i$ where $X_i$, $Y_i$ and $Z_i$ are X-Pauli, Y-Pauli and Z-Pauli operators on the qubit $i$. We use a 2-layered Hamiltonian Variational Ansatz with random initial parameters. a) We plot our landscape for different $\delta t$. The cuts in our high dimensional $\mathcal{L}(\bm{\theta})$ space contain both the initial parameters $\bm{\theta}=\boldsymbol{0}$ and the adiabatic minimum $\bm{\theta}_{A}(\delta t)$ at $\delta t$. b) We plot the size of our parameter update $\norm{\bm{\theta}}_\infty$, i.e. the distance along the cuts between the old minimum and the new adiabatic minimum, as a function of the time-step for different system sizes (from $n=4$ to $14$ for at least $20$ different instances for each qubit). We repeat the experiment with different random initial parameters and plot their mean and standard deviation. c) We show a violin and box plot--with the median and quartiles-- of the distributions we obtain for $\delta t=0.2$ as we increase the number of qubits. Note that the color assigned to each number of qubits matches that of the curve in b).
Figure 4: Minimum jump. Here we show a 1D-cut of the landscape $\mathcal{L}(\boldsymbol{\theta})$ as we increase the time-step $\delta t$. The cut includes the initial parameters-with update $\delta t = 0$ and $|| \boldsymbol{\theta} ||_\infty = 0$. We choose a 10 qubit Ising Hamiltonian $H=\sum X_iX_{i+1} - 0.95 \sum Y_i$ on a 1D-lattice. We use a 2-layered Hamiltonian Variational Ansatz.
Figure 5: Fertile valley. a) Here we show a 2D plot of the loss landscape at $\delta t = 0.04$ for a 10 qubit Ising Hamiltonian $H=\sum X_iX_{i+1} - 0.95 \sum Y_i$ on a 1D-lattice and use a 2-layered Hamiltonian Variational Ansatz. $\boldsymbol{\theta_0}$ is the initial starting point and $\boldsymbol{\theta^*}$ is the true global minimum. The axes are chosen using principle component analysis to project the multi-dimensional space into a 2D-plane using ORQVIZ rudolph2021orqviz and the white line is the projection of the optimization trajectory onto this 2D-plane. b) We plot the loss and directional loss gradient along the trajectory from the old to new minimum.
...and 3 more figures

Theorems & Definitions (43)

Theorem 1: Lower-bound on the loss variance, Informal
Definition 1: $\epsilon$-convexity
Theorem 2: Approximate convexity of the landscape, Informal
Definition 2: Adiabatic Minimum
Theorem 3: Adiabatic minimum is within provably 'nice' training region, Informal
Theorem 4: A substantial gradient region, Informal
Theorem 5: Taylor reminder theorem
Lemma 6
proof
Definition 3: Convexity
...and 33 more

Variational quantum simulation: a case study for understanding warm starts

TL;DR

Abstract

Variational quantum simulation: a case study for understanding warm starts

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (43)