Table of Contents
Fetching ...

Trainability Barriers in Low-Depth QAOA Landscapes

Joel Rajakumar, John Golden, Andreas Bärtschi, Stephan Eidenbenz

TL;DR

QAOA landscapes can exhibit a superpolynomial growth in the number of low-quality local minima even when the number of parameters scales logarithmically with n, which means that the common technique of gradient descent from randomly initialized parameters is doomed to fail beyond small n, and emphasizes the need for good initial guesses of the optimal parameters.

Abstract

The Quantum Alternating Operator Ansatz (QAOA) is a prominent variational quantum algorithm for solving combinatorial optimization problems. Its effectiveness depends on identifying input parameters that yield high-quality solutions. However, understanding the complexity of training QAOA remains an under-explored area. Previous results have given analytical performance guarantees for a small, fixed number of parameters. At the opposite end of the spectrum, barren plateaus are likely to emerge at $Ω(n)$ parameters for $n$ qubits. In this work, we study the difficulty of training in the intermediate regime, which is the focus of most current numerical studies and near-term hardware implementations. Through extensive numerical analysis of the quality and quantity of local minima, we argue that QAOA landscapes can exhibit a superpolynomial growth in the number of low-quality local minima even when the number of parameters scales logarithmically with $n$. This means that the common technique of gradient descent from randomly initialized parameters is doomed to fail beyond small $n$, and emphasizes the need for good initial guesses of the optimal parameters.

Trainability Barriers in Low-Depth QAOA Landscapes

TL;DR

QAOA landscapes can exhibit a superpolynomial growth in the number of low-quality local minima even when the number of parameters scales logarithmically with n, which means that the common technique of gradient descent from randomly initialized parameters is doomed to fail beyond small n, and emphasizes the need for good initial guesses of the optimal parameters.

Abstract

The Quantum Alternating Operator Ansatz (QAOA) is a prominent variational quantum algorithm for solving combinatorial optimization problems. Its effectiveness depends on identifying input parameters that yield high-quality solutions. However, understanding the complexity of training QAOA remains an under-explored area. Previous results have given analytical performance guarantees for a small, fixed number of parameters. At the opposite end of the spectrum, barren plateaus are likely to emerge at parameters for qubits. In this work, we study the difficulty of training in the intermediate regime, which is the focus of most current numerical studies and near-term hardware implementations. Through extensive numerical analysis of the quality and quantity of local minima, we argue that QAOA landscapes can exhibit a superpolynomial growth in the number of low-quality local minima even when the number of parameters scales logarithmically with . This means that the common technique of gradient descent from randomly initialized parameters is doomed to fail beyond small , and emphasizes the need for good initial guesses of the optimal parameters.
Paper Structure (10 sections, 6 equations, 6 figures, 1 algorithm)

This paper contains 10 sections, 6 equations, 6 figures, 1 algorithm.

Figures (6)

  • Figure 1: Conceptual overview of where this work fits into previous results on the difficulty of training QAOA circuits. For a small, fixed number of parameters, optimal parameters can be computed in some cases farhi2015quantum. When the number of parameters scales linearly in the number of qubits $n$, barren plateaus can emerge Larocca_2022, rendering gradient descent from random parameters exponentially ineffective. Finally, at an exponential number of parameters, overparameterization Larocca_2023 makes trainability straightforward, as gradient descent is guaranteed to converge to a global minimum. In this work, we argue that QAOA can suffer from difficult landscapes even in the sublinear regime.
  • Figure 2: Estimating the quantity of local minima using mean radius of basins. In figure (a), we depict a cost landscape with many local minima. We show a division of the cost landscape into basins (black lines) and highlight the boundaries of a particular basin (light blue). The number of minima is estimated by dividing the total volume of parameter space by the average volume of the basins. In figure (b) and (c), we sample Erdős-Rényi random graphs with edge probability $0.5$ and $n=8$. We then perform gradient descent from 200 random initializations of parameters (with number of rounds $p=5$) to arrive at 200 local minima. The radius of each minimum's basin is estimated by using Algorithm \ref{['alg:estimation']}. We observe that the radius estimates show low variation across choice of random vector and choice of minima (which are randomly selected in Algorithm \ref{['alg:estimation']}). This supports the fact that the mean basin radius is an effective proxy for the overall quantity of local minima.
  • Figure 3: Scaling in $n$ of local minima quality and quantity for fixed $p$. For each $n$, we sample 40 Erdős-Rényi random graphs with edge probability $0.5$, and for each graph, we sample 200 random initialization points in the cost landscape. Gradient descent is performed from each of these points to result in 200 (possibly non-distinct) local minima. (a) The quality of local minima is determined by the fraction of these local minima which reach an approximation ratio $> 0.99$, and (b) the quantity of minima is estimated by applying Algorithm \ref{['alg:estimation']} for each local minimum and taking the average.
  • Figure 4: Scaling in $p$ of local minima quality and quantity for fixed $n$. For each $n$, we sample 40 Erdős-Rényi random graphs with edge probability $0.5$, and for each $p$, we sample 200 random initialization points for gradient descent on each graph. Local minima quality is determined by the fraction of uniformly sampled points from which gradient descent reaches an approximation ratio $> 0.99$, and number of minima is estimated using Algorithm \ref{['alg:estimation']}.
  • Figure 5: Scaling in $n$ and $p$ of mean radius of minima. For each $n$, we sample 40 Erdős-Rényi random graphs with edge probability $0.5$, and for each $p$, we sample 200 random initialization points for gradient descent on each graph. The fact that the size (a) decreases as $n$ increases and (b) stays roughly constant in $p$ suggests that the total number of local minima scales superpolynomially for $p\propto \log(n)$.
  • ...and 1 more figures