Table of Contents
Fetching ...

Laziness, Barren Plateau, and Noise in Machine Learning

Junyu Liu, Zexi Lin, Liang Jiang

TL;DR

The paper defines laziness as a large suppression of variational parameter updates in quantum circuits and differentiates it from barren plateaus, arguing that laziness need not impede learning in the overparametrized regime. It formulates a quantum neural tangent kernel (QNTK) framework and shows that, under random 2-designs, the average kernel scales as $\bar{K} \approx \frac{2L \mathrm{Tr}(O^2)}{N^2}$ and concentrates for large $L$, enabling exponential decay of the residual error via $\varepsilon(t) = (1 - \eta K)^t \varepsilon(0)$; this provides a precision-based view that laziness does not equate to algorithmic failure and helps explain training dynamics in variational quantum algorithms. The work further analyzes the impact of measurement and control noise, deriving a stochastic recurrence that yields a late-time plateau $\mathcal{L}(\infty) \approx \frac{\sigma_\theta^2}{2\eta(2 - \eta K)}$, and suggests operating in the overparametrized regime with $\eta K = O(1)$ to maintain performance despite noise. Finally, it connects quantum and classical learning via the neural tangent kernel perspective, discusses design trade-offs between expressibility and barren-plateau avoidance, and outlines directions for near-term quantum devices and broader theoretical links to classical machine learning.

Abstract

We define \emph{laziness} to describe a large suppression of variational parameter updates for neural networks, classical or quantum. In the quantum case, the suppression is exponential in the number of qubits for randomized variational quantum circuits. We discuss the difference between laziness and \emph{barren plateau} in quantum machine learning created by quantum physicists in \cite{mcclean2018barren} for the flatness of the loss function landscape during gradient descent. We address a novel theoretical understanding of those two phenomena in light of the theory of neural tangent kernels. For noiseless quantum circuits, without the measurement noise, the loss function landscape is complicated in the overparametrized regime with a large number of trainable variational angles. Instead, around a random starting point in optimization, there are large numbers of local minima that are good enough and could minimize the mean square loss function, where we still have quantum laziness, but we do not have barren plateaus. However, the complicated landscape is not visible within a limited number of iterations, and low precision in quantum control and quantum sensing. Moreover, we look at the effect of noises during optimization by assuming intuitive noise models, and show that variational quantum algorithms are noise-resilient in the overparametrization regime. Our work precisely reformulates the quantum barren plateau statement towards a precision statement and justifies the statement in certain noise models, injects new hope toward near-term variational quantum algorithms, and provides theoretical connections toward classical machine learning. Our paper provides conceptual perspectives about quantum barren plateaus, together with discussions about the gradient descent dynamics in \cite{together}.

Laziness, Barren Plateau, and Noise in Machine Learning

TL;DR

The paper defines laziness as a large suppression of variational parameter updates in quantum circuits and differentiates it from barren plateaus, arguing that laziness need not impede learning in the overparametrized regime. It formulates a quantum neural tangent kernel (QNTK) framework and shows that, under random 2-designs, the average kernel scales as and concentrates for large , enabling exponential decay of the residual error via ; this provides a precision-based view that laziness does not equate to algorithmic failure and helps explain training dynamics in variational quantum algorithms. The work further analyzes the impact of measurement and control noise, deriving a stochastic recurrence that yields a late-time plateau , and suggests operating in the overparametrized regime with to maintain performance despite noise. Finally, it connects quantum and classical learning via the neural tangent kernel perspective, discusses design trade-offs between expressibility and barren-plateau avoidance, and outlines directions for near-term quantum devices and broader theoretical links to classical machine learning.

Abstract

We define \emph{laziness} to describe a large suppression of variational parameter updates for neural networks, classical or quantum. In the quantum case, the suppression is exponential in the number of qubits for randomized variational quantum circuits. We discuss the difference between laziness and \emph{barren plateau} in quantum machine learning created by quantum physicists in \cite{mcclean2018barren} for the flatness of the loss function landscape during gradient descent. We address a novel theoretical understanding of those two phenomena in light of the theory of neural tangent kernels. For noiseless quantum circuits, without the measurement noise, the loss function landscape is complicated in the overparametrized regime with a large number of trainable variational angles. Instead, around a random starting point in optimization, there are large numbers of local minima that are good enough and could minimize the mean square loss function, where we still have quantum laziness, but we do not have barren plateaus. However, the complicated landscape is not visible within a limited number of iterations, and low precision in quantum control and quantum sensing. Moreover, we look at the effect of noises during optimization by assuming intuitive noise models, and show that variational quantum algorithms are noise-resilient in the overparametrization regime. Our work precisely reformulates the quantum barren plateau statement towards a precision statement and justifies the statement in certain noise models, injects new hope toward near-term variational quantum algorithms, and provides theoretical connections toward classical machine learning. Our paper provides conceptual perspectives about quantum barren plateaus, together with discussions about the gradient descent dynamics in \cite{together}.
Paper Structure (12 sections, 59 equations, 3 figures)

This paper contains 12 sections, 59 equations, 3 figures.

Figures (3)

  • Figure 1: Density plots of the loss function landscape comparing usual and overparametrized variational quantum circuits. We illustrate the landscape by color plots of the loss function for two variational angles. Left: the traditional understanding of barren plateaus where we have the a single optimal point. Right: in the overparametrized case, the landscape is not barren, since for a random initial point, we get many good enough local optima that could minimize the loss function. Note that those plots are schematic since it is not possible to directly plot the loss function landscape in very high dimensions. In order to visualize it in $\mathcal{O}(1)$ numbers of iterations, one might have to have the number of trainable angles $L$ comparable to the dimension of the Hilbert space $N$.
  • Figure 2: Noise standard deviation $\sigma_\theta$ as a function of standard deviation of final residual error $\sigma_\varepsilon$ after training long enough time, with both numerical result (black dots) and theoretical prediction (red line). In this figure, $\eta=0.005$, $K\approx 25$, $\varepsilon(0)\approx 1$.
  • Figure 3: Standard deviation of final residual error $\sigma_\varepsilon$ as a function of learning rate $\eta$ after training long enough time, with both numerical result (black dots) and theoretical prediction (red line). In this figure, $\sigma_\theta=0.005$, $K\approx 35$, $\varepsilon(0)\approx 1$, $t=100$.