Laziness, Barren Plateau, and Noise in Machine Learning

Junyu Liu; Zexi Lin; Liang Jiang

Laziness, Barren Plateau, and Noise in Machine Learning

Junyu Liu, Zexi Lin, Liang Jiang

TL;DR

The paper defines laziness as a large suppression of variational parameter updates in quantum circuits and differentiates it from barren plateaus, arguing that laziness need not impede learning in the overparametrized regime. It formulates a quantum neural tangent kernel (QNTK) framework and shows that, under random 2-designs, the average kernel scales as $\bar{K} \approx \frac{2L \mathrm{Tr}(O^2)}{N^2}$ and concentrates for large $L$, enabling exponential decay of the residual error via $\varepsilon(t) = (1 - \eta K)^t \varepsilon(0)$; this provides a precision-based view that laziness does not equate to algorithmic failure and helps explain training dynamics in variational quantum algorithms. The work further analyzes the impact of measurement and control noise, deriving a stochastic recurrence that yields a late-time plateau $\mathcal{L}(\infty) \approx \frac{\sigma_\theta^2}{2\eta(2 - \eta K)}$, and suggests operating in the overparametrized regime with $\eta K = O(1)$ to maintain performance despite noise. Finally, it connects quantum and classical learning via the neural tangent kernel perspective, discusses design trade-offs between expressibility and barren-plateau avoidance, and outlines directions for near-term quantum devices and broader theoretical links to classical machine learning.

Abstract

We define \emph{laziness} to describe a large suppression of variational parameter updates for neural networks, classical or quantum. In the quantum case, the suppression is exponential in the number of qubits for randomized variational quantum circuits. We discuss the difference between laziness and \emph{barren plateau} in quantum machine learning created by quantum physicists in \cite{mcclean2018barren} for the flatness of the loss function landscape during gradient descent. We address a novel theoretical understanding of those two phenomena in light of the theory of neural tangent kernels. For noiseless quantum circuits, without the measurement noise, the loss function landscape is complicated in the overparametrized regime with a large number of trainable variational angles. Instead, around a random starting point in optimization, there are large numbers of local minima that are good enough and could minimize the mean square loss function, where we still have quantum laziness, but we do not have barren plateaus. However, the complicated landscape is not visible within a limited number of iterations, and low precision in quantum control and quantum sensing. Moreover, we look at the effect of noises during optimization by assuming intuitive noise models, and show that variational quantum algorithms are noise-resilient in the overparametrization regime. Our work precisely reformulates the quantum barren plateau statement towards a precision statement and justifies the statement in certain noise models, injects new hope toward near-term variational quantum algorithms, and provides theoretical connections toward classical machine learning. Our paper provides conceptual perspectives about quantum barren plateaus, together with discussions about the gradient descent dynamics in \cite{together}.

Laziness, Barren Plateau, and Noise in Machine Learning

TL;DR

and concentrates for large

, enabling exponential decay of the residual error via

; this provides a precision-based view that laziness does not equate to algorithmic failure and helps explain training dynamics in variational quantum algorithms. The work further analyzes the impact of measurement and control noise, deriving a stochastic recurrence that yields a late-time plateau

, and suggests operating in the overparametrized regime with

to maintain performance despite noise. Finally, it connects quantum and classical learning via the neural tangent kernel perspective, discusses design trade-offs between expressibility and barren-plateau avoidance, and outlines directions for near-term quantum devices and broader theoretical links to classical machine learning.

Abstract

Paper Structure (12 sections, 59 equations, 3 figures)

This paper contains 12 sections, 59 equations, 3 figures.

Barren plateau, laziness and noise
The loss function landscape and the QNTK theory
Precision and noise
Conclusion and outlook
Comments on the barren plateau in the classical machine learning
The fundamental difference between barren plateau and vanishing gradient
Classical large-width neural network has laziness as well
Classical large-width neural network could still learn efficiently
Some further details about concentration conditions
A physical interpretation
Noises
Numerical results

Figures (3)

Figure 1: Density plots of the loss function landscape comparing usual and overparametrized variational quantum circuits. We illustrate the landscape by color plots of the loss function for two variational angles. Left: the traditional understanding of barren plateaus where we have the a single optimal point. Right: in the overparametrized case, the landscape is not barren, since for a random initial point, we get many good enough local optima that could minimize the loss function. Note that those plots are schematic since it is not possible to directly plot the loss function landscape in very high dimensions. In order to visualize it in $\mathcal{O}(1)$ numbers of iterations, one might have to have the number of trainable angles $L$ comparable to the dimension of the Hilbert space $N$.
Figure 2: Noise standard deviation $\sigma_\theta$ as a function of standard deviation of final residual error $\sigma_\varepsilon$ after training long enough time, with both numerical result (black dots) and theoretical prediction (red line). In this figure, $\eta=0.005$, $K\approx 25$, $\varepsilon(0)\approx 1$.
Figure 3: Standard deviation of final residual error $\sigma_\varepsilon$ as a function of learning rate $\eta$ after training long enough time, with both numerical result (black dots) and theoretical prediction (red line). In this figure, $\sigma_\theta=0.005$, $K\approx 35$, $\varepsilon(0)\approx 1$, $t=100$.

Laziness, Barren Plateau, and Noise in Machine Learning

TL;DR

Abstract

Laziness, Barren Plateau, and Noise in Machine Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (3)