A Unified Theory of Quantum Neural Network Loss Landscapes
Eric R. Anschuetz
TL;DR
This paper provides a unified theory for quantum neural network (QNN) loss landscapes by introducing the Jordan algebraic Wishart system (JAWS) framework, which shows that wide QNNs have loss functions that converge to Wishart processes rather than Gaussian processes. By decomposing the algebra generated by QNN observables into simple Jordan components, the authors derive an explicit asymptotic description: the loss offset from optimum is a sum of independent Wishart contributions, \\ell(\\bm{\\rho};\\bm{\\theta}) - \\ell^* \\rightsquigarrow \\sum_\\alpha \\frac{I_\\alpha \\overline{o}_\\alpha}{r_\\alpha} \\mathrm{Tr}_\\alpha(\\bm{\\rho}^\\alpha \\bm{W}^\\alpha),$ with each \\bm{W}^\\alpha \\sim \\mathcal{W}_n^\\beta(r_\\alpha, \\bm{I})$ and degrees of freedom \\, r_\\alpha = \\left\\lceil \\frac{\\mathrm{Tr}_\\alpha(\\bm{O}^\\alpha)^2}{\\mathrm{Tr}_\\alpha((\\bm{O}^\\alpha)^2)} \\right\\rceil. This framework unifies prior results on barren plateaus, QNTK behavior, and local minima, and provides a practical, architecture-dependent measure of trainability based on the degrees of freedom. The work also shows that Gaussian-process limits occur only under specific normalization conditions, while most QNNs exhibit non-Gaussian (Wishart) loss landscapes, with explicit implications for gradient-based training and potential quantum advantages mainly in inference rather than training. Practically, the theory connects trainability to algebraic structure, enabling a quantum algorithmic approach to assess an architecture’s asymptotic trainability via traces of observables and inputs. Overall, the JAWS framework offers a rigorous, unifying lens for understanding QNN learning dynamics and highlights both opportunities and limits for quantum-accelerated training.
Abstract
Classical neural networks with random initialization famously behave as Gaussian processes in the limit of many neurons, which allows one to completely characterize their training and generalization behavior. No such general understanding exists for quantum neural networks (QNNs), which -- outside of certain special cases -- are known to not behave as Gaussian processes when randomly initialized. We here prove that QNNs and their first two derivatives instead generally form what we call "Wishart processes," where certain algebraic properties of the network determine the hyperparameters of the process. This Wishart process description allows us to, for the first time: give necessary and sufficient conditions for a QNN architecture to have a Gaussian process limit; calculate the full gradient distribution, generalizing previously known barren plateau results; and calculate the local minima distribution of algebraically constrained QNNs. Our unified framework suggests a certain simple operational definition for the "trainability" of a given QNN model using a newly introduced, experimentally accessible quantity we call the "degrees of freedom" of the network architecture.
