Table of Contents
Fetching ...

Tight Generalization Bound for Supervised Quantum Machine Learning

Xin Wang, Rebing Wu

TL;DR

A tight generalization bound for quantum machine learning is derived that is applicable to a wide range of supervised tasks, data, and models and points out that previous bounds relying on big-O notation may provide misleading suggestions regarding the generalization error.

Abstract

We derive a tight generalization bound for quantum machine learning that is applicable to a wide range of supervised tasks, data, and models. Our bound is both efficiently computable and free of big-O notation. Furthermore, we point out that previous bounds relying on big-O notation may provide misleading suggestions regarding the generalization error. Our generalization bound demonstrates that for quantum machine learning models of arbitrary size and depth, the sample size is the most dominant factor governing the generalization error. Additionally, the spectral norm of the measurement observable, the bound and Lipschitz constant of the selected risk function also influence the generalization upper bound. However, the number of quantum gates, the number of qubits, data encoding methods, and hyperparameters chosen during the learning process such as batch size, epochs, learning rate, and optimizer do not significantly impact the generalization capability of quantum machine learning. We experimentally demonstrate the tightness of our generalization bound across classification and regression tasks. Furthermore, we show that our tight generalization upper bound holds even when labels are completely randomized. We thus bring clarity to the fundamental question of generalization in quantum machine learning.

Tight Generalization Bound for Supervised Quantum Machine Learning

TL;DR

A tight generalization bound for quantum machine learning is derived that is applicable to a wide range of supervised tasks, data, and models and points out that previous bounds relying on big-O notation may provide misleading suggestions regarding the generalization error.

Abstract

We derive a tight generalization bound for quantum machine learning that is applicable to a wide range of supervised tasks, data, and models. Our bound is both efficiently computable and free of big-O notation. Furthermore, we point out that previous bounds relying on big-O notation may provide misleading suggestions regarding the generalization error. Our generalization bound demonstrates that for quantum machine learning models of arbitrary size and depth, the sample size is the most dominant factor governing the generalization error. Additionally, the spectral norm of the measurement observable, the bound and Lipschitz constant of the selected risk function also influence the generalization upper bound. However, the number of quantum gates, the number of qubits, data encoding methods, and hyperparameters chosen during the learning process such as batch size, epochs, learning rate, and optimizer do not significantly impact the generalization capability of quantum machine learning. We experimentally demonstrate the tightness of our generalization bound across classification and regression tasks. Furthermore, we show that our tight generalization upper bound holds even when labels are completely randomized. We thus bring clarity to the fundamental question of generalization in quantum machine learning.

Paper Structure

This paper contains 19 sections, 12 theorems, 47 equations, 14 figures.

Key Result

Theorem 1

Let $\mathcal{D}$ be a data distribution over $\mathcal{X} \times \mathcal{Y}$, and let $S = \{(\boldsymbol{\alpha}^{(m)}, y^{(m)})\}_{m=1}^M$ be a dataset of $M$ independent and identically distributed (i.i.d.) samples drawn from $\mathcal{D}$. Let the observable $O$ be a Pauli string with spectral

Figures (14)

  • Figure 1: (a) Quantum machine learning workflow: Quantum data (quantum states) are prepared using quantum circuits, or classical data are encoded into quantum states through some encoding scheme. The input is then processed through parameterized quantum circuits for learning, and the output are obtained through measurements. The model characteristics include number of qubits, encoding methods, and model complexity. (b) Training and test errors decrease as training epochs increase. During the training process of QML models, the choice of learning rate, optimizer, and batch size may all influence the training dynamics. (c) Our derived theoretical generalization error upper bound is tight and depends only on sample size, independent of the number of qubits, encoding methods, model complexity, learning rate, optimizer, and batch size.
  • Figure 2: (a) Phase diagram of the axial next-nearest-neighbor Ising (ANNNI) model, illustrating the boundaries between ordered and disordered phases in the parameter space defined by $\kappa$ and $h$; (b) Parameterized quantum circuit architecture for phase classification consisting of $L$ layers of rotation gates and controlled gates, where each qubit undergoes rotations $R_z(\theta_{1}) R_y(\theta_2) R_z(\theta_3)$ followed by ring-pattern CNOT gates creating entanglement.
  • Figure 3: (a) Training accuracy and test accuracy under different sample sizes; (b) Comparison between experimental generalization error and theoretical generalization upper bound with confidence $1-\delta = 0.9$. The maximum possible generalization error is 1. The error bars represent the minimum and maximum values across 10 independent runs with different training sets, with the central line showing the mean value.
  • Figure 4: (a) Comparison between our theoretical generalization upper bound and previous work caro2022generalization. Both bounds are shown with confidence $1-\delta = 0.9$. Some minimal experimental results with negative generalization error are not displayed due to the logarithmic scale. (b) Comparison of theoretical generalization error upper bounds under different model complexities, where layers represent complexity. The error bars represent the minimum and maximum values across 10 independent runs with different training sets or random seeds, with the central line showing the mean value.
  • Figure 5: (a) Training accuracy and test accuracy under random labels. (b) Comparison between experimental generalization error and our generalization upper bound under random labels. The error bars represent the minimum and maximum values across 10 independent runs with different training sets, with the central line showing the mean value.
  • ...and 9 more figures

Theorems & Definitions (20)

  • Theorem 1
  • Theorem A.1
  • proof
  • Corollary A.1
  • proof
  • Lemma A.1: Theorem C.1 in Ref. wangpredictive
  • Definition B.1: Empirical Rademacher Complexity
  • Lemma B.1: Theorem 3.3 in Ref. mohri2018foundations
  • Lemma B.2: Talagrand's lemma, Lemma 5.7 in Ref. mohri2018foundations
  • Lemma B.3
  • ...and 10 more