Table of Contents
Fetching ...

Trained quantum neural networks are Gaussian processes

Filippo Girardi, Giacomo De Palma

TL;DR

The paper analyzes quantum neural networks built from parametric single-qubit gates and fixed two-qubit gates in the infinite-width limit $m\to\infty$. It proves that the untrained network outputs converge to a Gaussian process with covariance $\mathcal{K}$ under locality constraints, and that gradient-flow training preserves a Gaussian-process description: the empirical neural tangent kernel concentrates to a fixed analytic kernel $\bar{K}$ and the linearized model yields an explicit Gaussian-process evolution. By combining concentration results with a lazy-training argument, the authors show the trained network remains a GP across time, with the mean and covariance evolving in closed form via $\bar{K}$; polynomially many measurements suffice to guarantee these conclusions even in the presence of measurement noise. The work extends the classical GP-landscape insights to quantum circuits with depth that may grow with the number of qubits, provided barren plateaus are avoided, thereby offering a rigorous foundation for trainability and generalization in variational quantum architectures and clarifying conditions under which quantum advantages may emerge.

Abstract

We study quantum neural networks made by parametric one-qubit gates and fixed two-qubit gates in the limit of infinite width, where the generated function is the expectation value of the sum of single-qubit observables over all the qubits. First, we prove that the probability distribution of the function generated by the untrained network with randomly initialized parameters converges in distribution to a Gaussian process whenever each measured qubit is correlated only with few other measured qubits. Then, we analytically characterize the training of the network via gradient descent with square loss on supervised learning problems. We prove that, as long as the network is not affected by barren plateaus, the trained network can perfectly fit the training set and that the probability distribution of the function generated after training still converges in distribution to a Gaussian process. Finally, we consider the statistical noise of the measurement at the output of the network and prove that a polynomial number of measurements is sufficient for all the previous results to hold and that the network can always be trained in polynomial time.

Trained quantum neural networks are Gaussian processes

TL;DR

The paper analyzes quantum neural networks built from parametric single-qubit gates and fixed two-qubit gates in the infinite-width limit . It proves that the untrained network outputs converge to a Gaussian process with covariance under locality constraints, and that gradient-flow training preserves a Gaussian-process description: the empirical neural tangent kernel concentrates to a fixed analytic kernel and the linearized model yields an explicit Gaussian-process evolution. By combining concentration results with a lazy-training argument, the authors show the trained network remains a GP across time, with the mean and covariance evolving in closed form via ; polynomially many measurements suffice to guarantee these conclusions even in the presence of measurement noise. The work extends the classical GP-landscape insights to quantum circuits with depth that may grow with the number of qubits, provided barren plateaus are avoided, thereby offering a rigorous foundation for trainability and generalization in variational quantum architectures and clarifying conditions under which quantum advantages may emerge.

Abstract

We study quantum neural networks made by parametric one-qubit gates and fixed two-qubit gates in the limit of infinite width, where the generated function is the expectation value of the sum of single-qubit observables over all the qubits. First, we prove that the probability distribution of the function generated by the untrained network with randomly initialized parameters converges in distribution to a Gaussian process whenever each measured qubit is correlated only with few other measured qubits. Then, we analytically characterize the training of the network via gradient descent with square loss on supervised learning problems. We prove that, as long as the network is not affected by barren plateaus, the trained network can perfectly fit the training set and that the probability distribution of the function generated after training still converges in distribution to a Gaussian process. Finally, we consider the statistical noise of the measurement at the output of the network and prove that a polynomial number of measurements is sufficient for all the previous results to hold and that the network can always be trained in polynomial time.
Paper Structure (59 sections, 67 theorems, 743 equations, 25 figures, 3 tables)

This paper contains 59 sections, 67 theorems, 743 equations, 25 figures, 3 tables.

Key Result

Lemma 2.13

As a consequence of Definition lightcones, we have a procedure to construct the extended future light cones of the parameters using the family of interactions $\mathcal{I}_U$:

Figures (25)

  • Figure 1: On the left, an internal structure of $V(x)$ which is forbidden according to Definition \ref{['deflayer']}. The one on the right is allowed.
  • Figure 2: Our parameterized quantum circuit.
  • Figure 3: The layer-qubit representation.
  • Figure 4: The general structure of a feature encoding layer (above) for $m=3$, $\dim\mathcal{X}=3$ and $S_\ell = \{ \{1,2\},\{3\}\}$ according to (\ref{['Vx']}) with a couple of examples.
  • Figure 5: Extended light cone $\mathcal{M}_{13}=\{1,6,7\}$ of the parameter $\theta_{13}$ for the circuit in the figure. Here $m=7$, $\mathcal{X}=[0,\pi]^2$, $|\Theta|=21$. Informally, the set $\mathcal{M}_{13}$ is the answer to the question: what are all the observables that may depend on the parameter $\theta_{13}$?
  • ...and 20 more figures

Theorems & Definitions (170)

  • Remark 2.2: The case of infinite input space
  • Definition 2.3: Mean squared error
  • Definition 2.4
  • Definition 2.5: Layer-qubit representation
  • Remark 2.7
  • Remark 2.8
  • Remark 2.9
  • Remark 2.10
  • Definition 2.11: Light cones
  • Definition 2.12: Extended light cones
  • ...and 160 more