Understanding quantum machine learning also requires rethinking generalization

Elies Gil-Fuster; Jens Eisert; Carlos Bravo-Prieto

Understanding quantum machine learning also requires rethinking generalization

Elies Gil-Fuster, Jens Eisert, Carlos Bravo-Prieto

TL;DR

This work reveals a fundamental mismatch between conventional uniform generalization bounds and the behavior of quantum neural networks by showing that parameterized quantum circuits can memorize random labels and random quantum states, even with modest training data. Through systematic randomization tests on a QCNN and a rigorous finite-sample expressivity analysis, the authors demonstrate that good generalization can coexist with memorization, rendering uniform bounds vacuous in the examined regimes. They provide constructive analytical results showing that polynomial-size PQCs can realize arbitrary labelings under a well-conditioned Gram matrix and a distinguishability condition, highlighting a memorization capacity that challenges traditional generalization guarantees. The findings argue for a paradigm shift toward non-uniform, task-specific generalization measures and symmetry-aware analyses to properly assess quantum learning models and guide future PQC design and evaluation.

Abstract

Quantum machine learning models have shown successful generalization performance even when trained with few data. In this work, through systematic randomization experiments, we show that traditional approaches to understanding generalization fail to explain the behavior of such quantum models. Our experiments reveal that state-of-the-art quantum neural networks accurately fit random states and random labeling of training data. This ability to memorize random data defies current notions of small generalization error, problematizing approaches that build on complexity measures such as the VC dimension, the Rademacher complexity, and all their uniform relatives. We complement our empirical results with a theoretical construction showing that quantum neural networks can fit arbitrary labels to quantum states, hinting at their memorization ability. Our results do not preclude the possibility of good generalization with few training data but rather rule out any possible guarantees based only on the properties of the model family. These findings expose a fundamental challenge in the conventional understanding of generalization in quantum machine learning and highlight the need for a paradigm shift in the study of quantum models for machine learning tasks.

Understanding quantum machine learning also requires rethinking generalization

TL;DR

Abstract

Paper Structure (16 sections, 5 theorems, 45 equations, 3 figures, 1 algorithm)

This paper contains 16 sections, 5 theorems, 45 equations, 3 figures, 1 algorithm.

Introduction
Results
Statistical learning theory background
Randomization tests
Numerical results
Random labels:
Corrupted labels:
Random states:
Implications
Analytical results
Discussion
Methods
Numerical methods
Analytical methods
Proof of Theorem 1
...and 1 more sections

Key Result

Theorem 1

Let $\rho_1,\ldots,\rho_N$ be unknown quantum states on $n\in\mathbb{N}$ qubits, with $N\in\mathcal{O}(\mathop{\mathrm{poly}}\nolimits(n))$, and let $W$ be the Gram matrix If $W$ is well-conditioned, then, for any $y_1,\ldots,y_N\in\mathbb{R}$ real numbers, we can construct a quantum circuit of depth $\mathop{\mathrm{poly}}\nolimits(n)$ as an observable $\mathcal{M}_y$ such that

Figures (3)

Figure 1: Visualization of our framework. (a) In the empirical experiments, a distribution of labeled quantum data $\mathcal{D}$ undergoes a randomization process, leading to a corrupted data distribution $\hat{\mathcal{D}}$. The training and a test set are drawn independently from each distribution. Then, the training sets are fed into an optimization algorithm, which is employed to identify the best fit for each data set individually from a family of parameterized quantum circuits $\mathcal{F}_Q$. This process generates two hypotheses: one for the original data $f_\text{original}$ and another for the corrupted data $f_\text{corrupted}$. We empirically find that the labeling functions can perfectly fit the training data, leading to small training errors. In parallel, $f_\text{original}$ achieves a small test error, indicating good learning performance, and quantified by a small generalization gap $\mathop{\mathrm{gen}}\nolimits(f_\text{original}) = \text{small}$. On the contrary, the randomization process causes $f_\text{corrupted}$ to achieve a large test error, which in turn results in a large generalization gap $\mathop{\mathrm{gen}}\nolimits(f_\text{corrupted}) = \text{large}$. (b) Regarding uniform generalization bounds, it is worth noting that this corner of QML literature assigns the same upper bound $g_\text{unif}$ to the entire function family without considering the specific characteristics of each individual function. Finally, we combine two significant findings: (1) We have identified a hypothesis with a large empirical generalization gap, and (2) the uniform generalization bounds impose identical upper bounds on all hypotheses. Consequently, we conclude that any uniform generalization bound derived from the literature must be regarded as "large", indicating that all such bounds are loose for that training data size. The notion of loose generalization bound does not exclude the possibility of achieving good generalization; rather, it fails to explain or predict such successful behavior.
Figure 2: Phase diagram of the generalized cluster Hamiltonian. The ground-state phase diagram of the Hamiltonian of Eq. \ref{['eq:hamiltonian']}. It comprises the phases: (I) symmetry-protected topological, (II) ferromagnetic, (III) anti-ferromagnetic, and (IV) trivial.
Figure 3: Randomization tests for quantum phase recognition. (a) Generalization gap as a function of the training set size achieved by the quantum convolutional neural network (QCNN) architecture. The QCNN is trained on real data, random label data, and random state data. The horizontal dashed line is the largest generalization gap attainable, characterized by zero training error and test error equal to random guessing ($0.75$ due to the task having four possible classes). The shaded area corresponds to the standard deviation across different experiment repetitions. For the real data and random labels, we employed $8, 16$, and $32$ qubits, while for the random states, we employed $8, 10$, and $12$ qubits. We observe that both random labels and random states exhibit a similar trend in the generalization gap, with a slight discrepancy in height due to the different relative frequencies of the four classes under the respective randomization protocols. In both cases, the test accuracy fails to surpass that of random guessing. Notably, the largest generalization gap occurs in the random labels experiments when using a training set of up to size $N=10$, highlighting the memorization capacity of this particular QCNN. The training with uncorrupted data yields behavior in accordance with previous results caro2022generalization. (b) Test error as a function of the ratio of label corruption after training the QCNN on training sets of size $N \in {4, 6, 8}$ and $n=8$. The plot illustrates the interpolation between uncorrupted data ($r=0$) and random labels ($r=1$). As the label corruption approaches $1$, the test accuracy drops to levels of random guessing. The dependence between the test error and label corruption reveals the ability of the QCNN to extract remaining signal despite the noise in the initial training set. The inset focuses on the case $N=6$. It conveys the optimization speed for four different levels of corruption, namely, $0,2,4$ and $6$ out of $6$ labels being corrupted, and provides insights into the average convergence time. The shaded area denotes the variance over five experiment repetitions with independently initialized QCNN parameters. Surprisingly, on average, fitting completely random noise takes less time than fitting unperturbed data. This phenomenon emphasizes that QCNNs can accurately memorize random data.

Theorems & Definitions (9)

Theorem 1: Finite sample expressivity of quantum circuits
Definition 1: Distinguishability condition
Theorem 2: Finite sample expressivity of PQCs
Theorem 3: Conditioning as a convex program \ref{['alg:convex']}
proof
Theorem 1: Finite sample expressivity of quantum circuits
proof
Theorem 2: Finite sample expressivity of PQCs
proof

Understanding quantum machine learning also requires rethinking generalization

TL;DR

Abstract

Understanding quantum machine learning also requires rethinking generalization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (9)