Statistical Complexity of Quantum Learning

Leonardo Banchi; Jason Luke Pereira; Sharu Theresa Jose; Osvaldo Simeone

Statistical Complexity of Quantum Learning

Leonardo Banchi, Jason Luke Pereira, Sharu Theresa Jose, Osvaldo Simeone

TL;DR

The paper develops an information-theoretic framework to quantify the statistical complexity of quantum learning, introducing data complexity ($N$), training-copy complexity ($S$), and testing-copy complexity ($V$) alongside model complexity, and analyzes both supervised and unsupervised tasks. By connecting classical statistical learning theory with quantum state discrimination, the authors derive error decompositions into optimality gaps and generalization terms, and compare unconstrained versus constrained operation regimes (e.g., Helstrom discrimination, tomography, kernel methods, and parametric quantum circuits). Key contributions include scaling laws for generalization and knowledge gaps across regimes, principled use of transductive learning for unknown quantum states, and concrete applications to learning phases of matter and entanglement, as well as to classical-shadows and other architectures. The work clarifies how information-theoretic quantities such as trace distance, quantum mutual information, and Rényi entropies govern learnability under realistic resource constraints, offering guidance for designing quantum learning algorithms that balance data efficiency and model expressivity. Overall, the paper provides a unified foundation for assessing quantum learning performance and suggests directions for future research on quantum advantages and generalization guarantees in practical settings.

Abstract

Recent years have seen significant activity on the problem of using data for the purpose of learning properties of quantum systems or of processing classical or quantum data via quantum computing. As in classical learning, quantum learning problems involve settings in which the mechanism generating the data is unknown, and the main goal of a learning algorithm is to ensure satisfactory accuracy levels when only given access to data and, possibly, side information such as expert knowledge. This article reviews the complexity of quantum learning using information-theoretic techniques by focusing on data complexity, copy complexity, and model complexity. Copy complexity arises from the destructive nature of quantum measurements, which irreversibly alter the state to be processed, limiting the information that can be extracted about quantum data. For example, in a quantum system, unlike in classical machine learning, it is generally not possible to evaluate the training loss simultaneously on multiple hypotheses using the same quantum data. To make the paper self-contained and approachable by different research communities, we provide extensive background material on classical results from statistical learning theory, as well as on the distinguishability of quantum states. Throughout, we highlight the differences between quantum and classical learning by addressing both supervised and unsupervised learning, and we provide extensive pointers to the literature.

Statistical Complexity of Quantum Learning

TL;DR

The paper develops an information-theoretic framework to quantify the statistical complexity of quantum learning, introducing data complexity (

), training-copy complexity (

), and testing-copy complexity (

) alongside model complexity, and analyzes both supervised and unsupervised tasks. By connecting classical statistical learning theory with quantum state discrimination, the authors derive error decompositions into optimality gaps and generalization terms, and compare unconstrained versus constrained operation regimes (e.g., Helstrom discrimination, tomography, kernel methods, and parametric quantum circuits). Key contributions include scaling laws for generalization and knowledge gaps across regimes, principled use of transductive learning for unknown quantum states, and concrete applications to learning phases of matter and entanglement, as well as to classical-shadows and other architectures. The work clarifies how information-theoretic quantities such as trace distance, quantum mutual information, and Rényi entropies govern learnability under realistic resource constraints, offering guidance for designing quantum learning algorithms that balance data efficiency and model expressivity. Overall, the paper provides a unified foundation for assessing quantum learning performance and suggests directions for future research on quantum advantages and generalization guarantees in practical settings.

Abstract

Paper Structure (64 sections, 11 theorems, 172 equations, 4 figures, 3 tables)

This paper contains 64 sections, 11 theorems, 172 equations, 4 figures, 3 tables.

Introduction and Summary
Scope
Examples
Learning Settings
Inductive vs. Transductive Learning
Architectures
Inductive Learning
Transductive Learning
Optimality Gap and Generalization Error
Known Training States
Unknown Training States
Overview of Results with Unconstrained Operations
Overview of Results with Constrained Operations
Applications
Parametric Quantum Circuits
...and 49 more sections

Key Result

Lemma 1.1

Let $X_j$ with $j=1,\dots,n$ be independent identically distributed random variables with mean $\mu = \mathbb E [X_j]$, defined in the interval $a\leq X_j \leq b$. For $c=b-a$ and arbitrary $t$, we get

Figures (4)

Figure 1: (a) In inductive learning, during the training phase (left), the training data $\mathcal{S}^S$ is used only once to produce the classical description of an inference operation $f$, which is stored in a classical memory. During test (right), copies of the training data are no longer needed, and the inference operation $f$ can be used on an arbitrary number of test inputs $\rho(x)^{\otimes V}$ to produce the prediction $y$. (b) In transductive learning, an $S$-copy training set $\mathcal{S}^S$ is jointly processed with the test input $\rho(x)^{\otimes V}$ to produce the prediction $y$. Therefore, new copies of the training set are required for each test input. (c) Transductive learning with an induction step. Training proceeds as in (a), but with less copies, $S_{\rm test}<S$. At test stage, the remaining copies $S_{\rm test}=S-S_{\rm train}$ are processed together with the input $\rho(x)^{\otimes V}$ and the classical memory, created during training, to produce the prediction $y$.
Figure 2: The error zoo expressed in terms of the average loss $L_{\mathcal{P}}(f)$ for a given inference function $f\in\mathcal{F}$ and of the dataset loss $L(f,\mathcal{S})$ based on the abstract training data set $\mathcal{S}$ with $N$ entries. Recall that availability of the abstract training data set entails knowledge of the quantum states in the training set. The true minimum of functions $L_{\mathcal{P}}(f)$ and $L(f,\mathcal{S})$ are respectively denoted as $f_*$ and $f_{\mathcal{S}}$. When an $S$-copy version of the training data set is available, the learner obtains the approximate minimum $f_{\mathcal{S}}^{S}$, which achieves the dataset loss $L(f_{\mathcal{S}}^{S},\mathcal{S})$ and the average, or testing, loss $L_{\mathcal{P}}(f_{\mathcal{S}}^{S})$.
Figure 3: (a) The state exponentiation algorithm uses a target state $\sigma$ and $m$ copies of $\rho$ to act on $\sigma$ with an approximate unitary $U=e^{imt \rho}$, up to an error $\mathcal{O}(m t^2)$. It only uses partial SWAP gates, as in Eq. \ref{['state exponentiation']}. (b) The phase estimation algorithm uses two registers, one initialized in $\ket 0^{\otimes m}$ and the other initialized in $\ket\psi$. It applies Hadamard gates, powers of controlled-$U$ gates For a generic unitary in some diagonal (possibly unknown) basis $U= \sum_k e^{2\pi i \phi_k} \ket{\lambda_k}\!\!\bra{\lambda_k}$, this algorithm transforms a generic input $\ket\psi = \sum_k \psi_k \ket{\lambda_k}$ into $\sum_{k} \psi_k \ket{b_1,\dots,b_m} \ket{\lambda_k}$, where the measurement of the first register provides a bitstring approximation of the phase as $\phi_k \approx b_1/2 + \dots + b_m /2^m$ and prepares the second register in the eigenvector $\ket{\lambda_k}$ -- see Nielsen_Chuang for an extended discussion on the precision as a function of $m$.
Figure 4: Two different algorithms for computing the overlap $\mathop{\mathrm{Tr}}\limits[\rho\sigma]$, the swap test (a) and the swap measurement (b). The swap test (a) uses an ancillary qubit, two Hadamard gates and a Fredkin gate (control-SWAP), followed by a measurement on the ancilla. Measurement on the ancilla results in the desired result as $\langle Z\rangle = \mathop{\mathrm{Tr}}\limits[\rho\sigma]$. The swap measurement (b) uses the fact that $\mathop{\mathrm{Tr}}\limits[\rho\sigma] = \mathop{\mathrm{Tr}}\limits[\rho\otimes\sigma {\rm SWAP}]$ and that the single-qubit swap operator can be diagonalized via a CNOT gate and Hadamard gate, with diagonal form $\sum_{a,b=0}^1 (-1)^{ab}\ket{ab}\!\!\bra{ab}$. Performing these operations in each pairs of qubits from either $\rho$ and $\sigma$ we can then estimate the result as $\mathop{\mathrm{Tr}}\limits[\rho\sigma] = \mathbb[\prod_{i=1}^n (-1)^{a_i b_i}]$ where $n$ is the number of qubits in $\rho$ and $\sigma$.

Theorems & Definitions (11)

Lemma 1.1: Hoeffding's inequality
Theorem 1.2
Lemma 1.3
Theorem 1.4: Uniform deviation boundbartlett2021deep
Theorem 1.5: Khintchine inequalities haagerup1981best
Theorem 1.6: Operator Khintchine inequalities lust1986inegalitescandes2012exact
Theorem 1.7: Tropp inequality tropp2015introduction
Lemma 1.8: Contraction lemmashalev2014understanding
Theorem 1.9: credited to Blanchardscott2005learningcohen2020learning
Corollary 1.10: adapted from Section 3.3 of Ref. cohen2020learning
...and 1 more

Statistical Complexity of Quantum Learning

TL;DR

Abstract

Statistical Complexity of Quantum Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (11)