A Unified Theory of Quantum Neural Network Loss Landscapes

Eric R. Anschuetz

A Unified Theory of Quantum Neural Network Loss Landscapes

Eric R. Anschuetz

TL;DR

This paper provides a unified theory for quantum neural network (QNN) loss landscapes by introducing the Jordan algebraic Wishart system (JAWS) framework, which shows that wide QNNs have loss functions that converge to Wishart processes rather than Gaussian processes. By decomposing the algebra generated by QNN observables into simple Jordan components, the authors derive an explicit asymptotic description: the loss offset from optimum is a sum of independent Wishart contributions, \\ell(\\bm{\\rho};\\bm{\\theta}) - \\ell^* \\rightsquigarrow \\sum_\\alpha \\frac{I_\\alpha \\overline{o}_\\alpha}{r_\\alpha} \\mathrm{Tr}_\\alpha(\\bm{\\rho}^\\alpha \\bm{W}^\\alpha),$ with each \\bm{W}^\\alpha \\sim \\mathcal{W}_n^\\beta(r_\\alpha, \\bm{I})$ and degrees of freedom \\, r_\\alpha = \\left\\lceil \\frac{\\mathrm{Tr}_\\alpha(\\bm{O}^\\alpha)^2}{\\mathrm{Tr}_\\alpha((\\bm{O}^\\alpha)^2)} \\right\\rceil. This framework unifies prior results on barren plateaus, QNTK behavior, and local minima, and provides a practical, architecture-dependent measure of trainability based on the degrees of freedom. The work also shows that Gaussian-process limits occur only under specific normalization conditions, while most QNNs exhibit non-Gaussian (Wishart) loss landscapes, with explicit implications for gradient-based training and potential quantum advantages mainly in inference rather than training. Practically, the theory connects trainability to algebraic structure, enabling a quantum algorithmic approach to assess an architecture’s asymptotic trainability via traces of observables and inputs. Overall, the JAWS framework offers a rigorous, unifying lens for understanding QNN learning dynamics and highlights both opportunities and limits for quantum-accelerated training.

Abstract

Classical neural networks with random initialization famously behave as Gaussian processes in the limit of many neurons, which allows one to completely characterize their training and generalization behavior. No such general understanding exists for quantum neural networks (QNNs), which -- outside of certain special cases -- are known to not behave as Gaussian processes when randomly initialized. We here prove that QNNs and their first two derivatives instead generally form what we call "Wishart processes," where certain algebraic properties of the network determine the hyperparameters of the process. This Wishart process description allows us to, for the first time: give necessary and sufficient conditions for a QNN architecture to have a Gaussian process limit; calculate the full gradient distribution, generalizing previously known barren plateau results; and calculate the local minima distribution of algebraically constrained QNNs. Our unified framework suggests a certain simple operational definition for the "trainability" of a given QNN model using a newly introduced, experimentally accessible quantity we call the "degrees of freedom" of the network architecture.

A Unified Theory of Quantum Neural Network Loss Landscapes

TL;DR

and degrees of freedom \\, r_\\alpha = \\left\\lceil \\frac{\\mathrm{Tr}_\\alpha(\\bm{O}^\\alpha)^2}{\\mathrm{Tr}_\\alpha((\\bm{O}^\\alpha)^2)} \\right\\rceil. This framework unifies prior results on barren plateaus, QNTK behavior, and local minima, and provides a practical, architecture-dependent measure of trainability based on the degrees of freedom. The work also shows that Gaussian-process limits occur only under specific normalization conditions, while most QNNs exhibit non-Gaussian (Wishart) loss landscapes, with explicit implications for gradient-based training and potential quantum advantages mainly in inference rather than training. Practically, the theory connects trainability to algebraic structure, enabling a quantum algorithmic approach to assess an architecture’s asymptotic trainability via traces of observables and inputs. Overall, the JAWS framework offers a rigorous, unifying lens for understanding QNN learning dynamics and highlights both opportunities and limits for quantum-accelerated training.

Abstract

Paper Structure (45 sections, 33 theorems, 318 equations, 3 figures, 2 tables)

This paper contains 45 sections, 33 theorems, 318 equations, 3 figures, 2 tables.

Introduction
Motivation
Contributions
Preliminaries
Quantum Neural Networks
Jordan Algebras
Jordan Algebraic Descriptions of Quantum Neural Networks
Quantum Neural Networks Are Wishart Processes
New Results in Landscape Theory From the JAWS Framework
Barren Plateaus
The Quantum Neural Tangent Kernel
Local Minima
Conclusion
Preliminaries for Formal Discussion of Results
Quantum Neural Networks
...and 30 more sections

Key Result

Theorem 1

Consider a QNN with associated JAWS as in Sec. sec:jasa_mt, initialized approximately uniformly at random. Let $\ell^\ast$ be the optimum of the loss and $\overline{o}_\alpha$ the mean eigenvalue of $\bm{O}^\alpha$. Then, as $\dim\left(\mathcal{A}\right)\to\infty$, there is a convergence in joint di The $\bm{W}_\alpha$ are each independent Wishart-distributed random matrices in the defining repres

Figures (3)

Figure 1: Loss and derivative densities. (a) The loss density when the quantum neural network has a pure input (i.e., rank-$1$) and when it has a mixed input (i.e., of rank greater than $1$). The distributions are centered at the mean eigenvalue of the objective observable. The mixed input density also illustrates when the input is mixed when projected into any simple component of the Jordan algebra associated with the network. (b) The gradient density conditioned on a nonzero loss function value. The distribution is centered at zero. (c) The density of local minima when the quantum neural network is underparameterized, overparameterized, and when some simple components of the associated Jordan algebra are underparameterized and some are overparameterized (mix of sectors).
Figure 2: Marčenko--Pastur distribution. The density of the Marčenko--Pastur distribution---the asymptotic empirical eigenvalue distribution of normalized Wishart matrices---in the regime where $\gamma\ll 1$ and where $\gamma=1$. At $\gamma=1$ the associated Wishart matrix transitions from being full-rank to being low-rank.
Figure 3: Relation between models for quantum variational loss landscapes. Our introduced theory for the loss landscapes of quantum neural networks (QNNs) are Jordan algebraic Wishart systems (JAWS), which relate the algebraic structure of a given QNN architecture to an asymptotic random process description of the loss landscape. This JAWS description reduces to the previously-studied Wishart hypertoroidal random fields (WHRFs) and quantum neural tangent kernel (QNTK) in different settings, and also reproduces the loss function variance known for Lie algebra-supported ansatzes (LASAs) and matchgate (MG) networks. Cartoons of the loss landscapes associated with previously studied models are shown.

Theorems & Definitions (59)

Theorem 1: Quantum neural networks are Wishart processes, informal
Theorem 2: Gradient distribution, informal
Theorem 3: Hessian distribution, informal
Corollary 4: General expression for the loss function variance, informal
Corollary 5: Exact conditions for convergence to a Gaussian process, informal
Corollary 6: Density of local minima, informal
Definition 7: Trainability of quantum neural networks
Theorem 8: Classification of semisimple Euclidean Jordan algebras Koecher19997
Definition 9: Haar measure on $G$ 9006cc9e-2dcc-3fd8-aada-e4af19b6e225
Definition 10: $\epsilon$-approximate $t$-designs over $G$
...and 49 more

A Unified Theory of Quantum Neural Network Loss Landscapes

TL;DR

Abstract

A Unified Theory of Quantum Neural Network Loss Landscapes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (59)