Trained quantum neural networks are Gaussian processes
Filippo Girardi, Giacomo De Palma
TL;DR
The paper analyzes quantum neural networks built from parametric single-qubit gates and fixed two-qubit gates in the infinite-width limit $m\to\infty$. It proves that the untrained network outputs converge to a Gaussian process with covariance $\mathcal{K}$ under locality constraints, and that gradient-flow training preserves a Gaussian-process description: the empirical neural tangent kernel concentrates to a fixed analytic kernel $\bar{K}$ and the linearized model yields an explicit Gaussian-process evolution. By combining concentration results with a lazy-training argument, the authors show the trained network remains a GP across time, with the mean and covariance evolving in closed form via $\bar{K}$; polynomially many measurements suffice to guarantee these conclusions even in the presence of measurement noise. The work extends the classical GP-landscape insights to quantum circuits with depth that may grow with the number of qubits, provided barren plateaus are avoided, thereby offering a rigorous foundation for trainability and generalization in variational quantum architectures and clarifying conditions under which quantum advantages may emerge.
Abstract
We study quantum neural networks made by parametric one-qubit gates and fixed two-qubit gates in the limit of infinite width, where the generated function is the expectation value of the sum of single-qubit observables over all the qubits. First, we prove that the probability distribution of the function generated by the untrained network with randomly initialized parameters converges in distribution to a Gaussian process whenever each measured qubit is correlated only with few other measured qubits. Then, we analytically characterize the training of the network via gradient descent with square loss on supervised learning problems. We prove that, as long as the network is not affected by barren plateaus, the trained network can perfectly fit the training set and that the probability distribution of the function generated after training still converges in distribution to a Gaussian process. Finally, we consider the statistical noise of the measurement at the output of the network and prove that a polynomial number of measurements is sufficient for all the previous results to hold and that the network can always be trained in polynomial time.
