Table of Contents
Fetching ...

Optimal training of variational quantum algorithms without barren plateaus

Tobias Haug, M. S. Kim

TL;DR

This work tackles the notorious barren plateau problem in variational quantum algorithms by proposing that fidelity between quantum states defines a Gaussian kernel in the PQC parameter space, weighted by the quantum Fisher information metric. It introduces adaptive learning rates and a generalized quantum natural gradient (GQNG), showing that a stable, beta-tuned gradient direction can dramatically accelerate training and control tasks. A key result is a gradient-variance bound that remains non-vanishing when the initial fidelity is bounded below by $\gamma$, enabling trainability on larger systems and identifying a barren-plateau-free VQA instance in projected variational quantum dynamics. The approach also connects to quantum machine learning through Gaussian-kernel realizations on hardware-efficient PQCs, suggesting practical near-term benefits for state preparation, quantum control, and ML-inspired quantum algorithms.

Abstract

Variational quantum algorithms (VQAs) promise efficient use of near-term quantum computers. However, training VQAs often requires an extensive amount of time and suffers from the barren plateau problem where the magnitude of the gradients vanishes with increasing number of qubits. Here, we show how to optimally train VQAs for learning quantum states. Parameterized quantum circuits can form Gaussian kernels, which we use to derive adaptive learning rates for gradient ascent. We introduce the generalized quantum natural gradient that features stability and optimized movement in parameter space. Both methods together outperform other optimization routines in training VQAs. Our methods also excel at numerically optimizing driving protocols for quantum control problems. The gradients of the VQA do not vanish when the fidelity between the initial state and the state to be learned is bounded from below. We identify a VQA for quantum simulation with such a constraint that thus can be trained free of barren plateaus. Finally, we propose the application of Gaussian kernels for quantum machine learning.

Optimal training of variational quantum algorithms without barren plateaus

TL;DR

This work tackles the notorious barren plateau problem in variational quantum algorithms by proposing that fidelity between quantum states defines a Gaussian kernel in the PQC parameter space, weighted by the quantum Fisher information metric. It introduces adaptive learning rates and a generalized quantum natural gradient (GQNG), showing that a stable, beta-tuned gradient direction can dramatically accelerate training and control tasks. A key result is a gradient-variance bound that remains non-vanishing when the initial fidelity is bounded below by , enabling trainability on larger systems and identifying a barren-plateau-free VQA instance in projected variational quantum dynamics. The approach also connects to quantum machine learning through Gaussian-kernel realizations on hardware-efficient PQCs, suggesting practical near-term benefits for state preparation, quantum control, and ML-inspired quantum algorithms.

Abstract

Variational quantum algorithms (VQAs) promise efficient use of near-term quantum computers. However, training VQAs often requires an extensive amount of time and suffers from the barren plateau problem where the magnitude of the gradients vanishes with increasing number of qubits. Here, we show how to optimally train VQAs for learning quantum states. Parameterized quantum circuits can form Gaussian kernels, which we use to derive adaptive learning rates for gradient ascent. We introduce the generalized quantum natural gradient that features stability and optimized movement in parameter space. Both methods together outperform other optimization routines in training VQAs. Our methods also excel at numerically optimizing driving protocols for quantum control problems. The gradients of the VQA do not vanish when the fidelity between the initial state and the state to be learned is bounded from below. We identify a VQA for quantum simulation with such a constraint that thus can be trained free of barren plateaus. Finally, we propose the application of Gaussian kernels for quantum machine learning.

Paper Structure

This paper contains 15 sections, 56 equations, 17 figures.

Figures (17)

  • Figure 1: a) The variational quantum algorithm (VQA) consists of a parameterized quantum circuit (PQC) that generates the quantum state $|\psi(\boldsymbol{\theta}) \rangle=U(\boldsymbol{\theta})|0 \rangle$ with unitary $U(\boldsymbol{\theta})$ and parameters $\boldsymbol{\theta}$, as well as a classical optimization routine. Measurements on the quantum state are used to calculate the cost function, which is then optimized by the classical optimizer in a feed back loop by adjusting the parameters $\boldsymbol{\theta}$. b) VQA to represent the target state $|\psi_\text{t} \rangle$ using $|\psi(\boldsymbol{\theta}) \rangle$. Goal is to find target parameters $\boldsymbol{\theta}_\text{t}=\text{argmax}_{\boldsymbol{\theta}}K_\text{t}(\boldsymbol{\theta})$ that approximate the target state by maximizing the fidelity $K_\text{t}(\boldsymbol{\theta})=\left|\langle \psi_\text{t} \vert \psi(\boldsymbol{\theta}) \rangle\right|^2$. Training is performed using the gradient $G_0(\boldsymbol{\theta})=\nabla K_\text{t}(\boldsymbol{\theta})$, which points in the direction of steepest increase of fidelity. c) The landscape of the fidelity $K_\text{t}(\boldsymbol{\theta})$ as function of $\boldsymbol{\theta}$ often has barren plateaus, where the fidelity and its gradients are exponentially small within most of the parameter space. However, as long as the initial quantum state of the VQA is guaranteed to have a lower bounded fidelity, then the magnitude of the gradient does not vanish even for many qubits and the barren plateaus can be avoided (Eq.\ref{['eq:var_grad_bound']}). d) Gradient ascent optimizes the fidelity by updating $\boldsymbol{\theta}'=\boldsymbol{\theta} +\alpha G_0(\boldsymbol{\theta})$. As the fidelity landscape as function of $\boldsymbol{\theta}$ is in general not euclidean, standard gradient ascent does not take the fastest path. e) By using quantum geometric information about the parameter space with the quantum Fisher information metric (QFIM) $\mathcal{F}(\boldsymbol{\theta})$, the parameter space can be transformed to get the quantum natural gradient (QNG) which moves in the best direction (see solid blue curve). To yield stable gradients in practice, the QNG requires regularization. The generalized quantum natural gradient (GQNG) (Eq. (\ref{['eq:general_gradient']})) interpolates between standard gradient and QNG, and can be stable without regularization. f) The learning rate $\alpha$ for the gradient update is normally a fixed heuristic learning rate (dashed red curves). The fidelity of PQCs can form Gaussian kernels, which is used to calculate adaptive learning rates (Eq. (\ref{['eq:update_add']})) for each gradient update (blue solid curve).
  • Figure 2: a) Average fidelity $\langle K_\text{t}(\boldsymbol{\theta})\rangle$ as function of parameter norm $\Delta \boldsymbol{\theta}^\text{T}\mathcal{F}(\boldsymbol{\theta})\Delta \boldsymbol{\theta}$, with distance $\Delta \boldsymbol{\theta}=\boldsymbol{\theta}-\boldsymbol{\theta}_\text{t}$ and target parameters $\boldsymbol{\theta}_\text{t}$. Shaded area is the 20-th and 80-th percentile of the fidelity. We find a good match with the Gaussian kernel (Eq. (\ref{['eq:kernel']}), dash-dotted line). For large norm, we see the fidelity converges to the fidelity given by random states $\langle \mathcal{K}_\text{rand}\rangle=\frac{1}{2^N}$ (dashed lines). We use three different PQCs with randomized parameters, which are defined in Appendix \ref{['app:pqc']}. Number of layers $p=20$ for R-CPHASE $N=10$, $p=16$ for $N=16$, else $p=10$. Average over 50 random instances of $\boldsymbol{\theta}_\text{t}$. b) Variance of gradient $\text{var}(\partial_k K_\text{t}(\boldsymbol{\theta}))$ against infidelity $\Delta K_\text{t}(\boldsymbol{\theta})$ for different types of PQCs. Dashed lines are the analytic formula Eq. (\ref{['eq:var_grad']}) for the variance.
  • Figure 3: a) Variance of gradient $\text{var}(\partial_k K_\text{t}(\boldsymbol{\theta}))$ against number of qubits $N$ for different types of PQCs and infidelities $\Delta K_\text{t}(\boldsymbol{\theta})$. The number of layers is $p=20$ for R-CPHASE and $p=10$ for YZ-CNOT. Dashed lines are the analytic formula Eq. (\ref{['eq:var_grad']}) for the variance of the gradient. b) Average infidelity $\langle \Delta F(\boldsymbol{\theta}_\text{t}')\rangle$ after one iteration of gradient ascent with adaptive learning rate (Eq. (\ref{['eq:update_add']})) plotted against exponent $\beta$ of the GQNG (Eq. (\ref{['eq:general_gradient']})) for different types of PQCs and regularization parameter $\epsilon_\text{R}$. Initial infidelity is $\Delta K_\text{t}(\boldsymbol{\theta})=0.9$.
  • Figure 4: a) Average infidelity $\langle\Delta K_\text{t}(\boldsymbol{\theta}_\text{t}')\rangle$ (Eq. (\ref{['eq:infidelity']})) plotted against learning rate $\lambda$ using the gradient ascent update $\boldsymbol{\theta}_\text{t}'=\boldsymbol{\theta}+\lambda G_\frac{1}{2}(\boldsymbol{\theta})$ with GQNG. $\lambda$ is normalized in respect to adaptive learning rate $\alpha_\text{t}$ (Eq. (\ref{['eq:update_add']})), shown as vertical dashed line. Curves show various initial infidelities $\Delta K_\text{t}(\boldsymbol{\theta})$, with the shaded area being the standard deviation of $\Delta K_\text{t}(\boldsymbol{\theta}_\text{t}')$. Infidelity is averaged over 50 random instances of $\boldsymbol{\theta}_\text{t}$ for the YZ-CNOT PQC. b) Average infidelity $\langle \Delta K_\text{t}(\boldsymbol{\theta}_\text{t}') \rangle$ after one iteration of adaptive gradient ascent against initial infidelity $\Delta K_\text{t}(\boldsymbol{\theta})$. We show the regular gradient ($\beta=0$, upper curves), GQNG ($\beta=\frac{1}{2}$, center curves) and QNG ($\beta=1$, regularization $\epsilon_\text{R}=10^{-1}$, lower curves) for various types of PQCs. The red and black curves are fits with $\Delta K_\text{t}(\boldsymbol{\theta}_\text{t}') =c[\frac{1}{4}\Delta\boldsymbol{\theta}^\text{T}\mathcal{F}\Delta\boldsymbol{\theta}]^\nu=-c\log^\nu[1- \Delta K_\text{t}(\boldsymbol{\theta}) ]$ with $\nu=1$ for $\beta=0$, $\beta=\frac{1}{2}$ and $\nu=1.5$ for $\beta=1$. The scaling factor is $c(\beta=1)=0.072$, $c(\beta=\frac{1}{2})=0.14$ and $c(\beta=0)=0.32$.
  • Figure 5: a) Training VQA. Average infidelity $\langle \Delta K_\text{t}(\boldsymbol{\theta}_\text{t}') \rangle$ against number of iterations for optimizing the VQA. Shaded area is the standard deviation over 50 instances of training. We compare different optimization methods against each other. We find that adaptive gradient ascent with QNG (A-QNG, $\beta=1$, regularization $\epsilon_\text{R}=10^{-1}$) performs best, followed by adaptive GQNG (A-GQNG, $\beta=\frac{1}{2}$, $\epsilon_\text{R}=0$). Standard optimization methods such as Adam ($\alpha=0.1$) and LBFGS perform a comparable to adaptive gradient ascent with the regular gradient (A-G). Non-adaptive QNG (S-QNG, $\alpha=1$, $\epsilon_\text{R}=10^{-1}$) is initially worse, but for more iterations outperforms the methods that do not use the QFIM. Initial infidelity is $\Delta K_\text{t}(\boldsymbol{\theta})=0.9$, training is averaged over 50 random instances of $\boldsymbol{\theta}_\text{t}$, PQC is YZ-CNOT, $N=10$ and $p=10$. b) Optimizing control problem. Average infidelity $\langle \Delta K_\text{g}(h') \rangle$ against number of iterations for optimizing driving parameters $h'$. We use driving Hamiltonian Eq. (\ref{['eq:control']}) with $g=1$, $N=6$, $\Delta t=1$ and $T=d=16$. The goal is to find the driving protocol that evolves the zero state to the ground state of Eq. (\ref{['eq:ising']}) with $g=1$ and $h=1$. We average the training data over 20 instances of initially random protocols with $h_n^p\in[-1,1]$.
  • ...and 12 more figures