Concept learning of parameterized quantum models from limited measurements

Beng Yee Gan; Po-Wei Huang; Elies Gil-Fuster; Patrick Rebentrost

Concept learning of parameterized quantum models from limited measurements

Beng Yee Gan, Po-Wei Huang, Elies Gil-Fuster, Patrick Rebentrost

TL;DR

The paper develops a kernel-based, probabilistic framework for learning parameterized quantum models under finite measurement shots, revealing an asymmetry: increasing the number of training inputs $N_1$ improves learning even in the single-shot regime ($N_s=1$), while increasing $N_s$ yields diminishing returns beyond a constant factor. It characterizes the learning process as $p$-concept learning with explicit and implicit losses, provides Alphatron-like algorithms with provable guarantees, and shows how a Lipschitz link function mitigates shot-noise–induced variance. The work also connects PQCs to classical surrogates via Fourier representations and Random Fourier Features, deriving error bounds for both link-assisted and link-free models, and validates the theory through numerical experiments on data-reuploading PQCs. Overall, it offers budget-aware guidance for collecting classical training data and refining classical surrogates of quantum models, with implications for robust classical learnability in the presence of shot noise.

Abstract

Classical learning of the expectation values of observables for quantum states is a natural variant of learning quantum states or channels. While learning-theoretic frameworks establish the sample complexity and the number of measurement shots per sample required for learning such statistical quantities, the interplay between these two variables has not been adequately quantified before. In this work, we take the probabilistic nature of quantum measurements into account in classical modelling and discuss these quantities under a single unified learning framework. We provide provable guarantees for learning parameterized quantum models that also quantify the asymmetrical effects and interplay of the two variables on the performance of learning algorithms. These results show that while increasing the sample size enhances the learning performance of classical machines, even with single-shot estimates, the improvements from increasing measurements become asymptotically trivial beyond a constant factor. We further apply our framework and theoretical guarantees to study the impact of measurement noise on the classical surrogation of parameterized quantum circuit models. Our work provides new tools to analyse the operational influence of finite measurement noise in the classical learning of quantum systems.

Concept learning of parameterized quantum models from limited measurements

TL;DR

The paper develops a kernel-based, probabilistic framework for learning parameterized quantum models under finite measurement shots, revealing an asymmetry: increasing the number of training inputs

improves learning even in the single-shot regime (

), while increasing

yields diminishing returns beyond a constant factor. It characterizes the learning process as

-concept learning with explicit and implicit losses, provides Alphatron-like algorithms with provable guarantees, and shows how a Lipschitz link function mitigates shot-noise–induced variance. The work also connects PQCs to classical surrogates via Fourier representations and Random Fourier Features, deriving error bounds for both link-assisted and link-free models, and validates the theory through numerical experiments on data-reuploading PQCs. Overall, it offers budget-aware guidance for collecting classical training data and refining classical surrogates of quantum models, with implications for robust classical learnability in the presence of shot noise.

Abstract

Paper Structure (32 sections, 12 theorems, 99 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 32 sections, 12 theorems, 99 equations, 4 figures, 1 table, 1 algorithm.

Introduction
Preliminaries
Probabilistic concept learning
Hypothesis class for modelling probabilistic concepts
The family of parameterized quantum models
PQCs and their classical Fourier representations
Data extraction from parameterized quantum models
Parameterized Quantum Models as Probabilistic Concepts
Algorithm for concept learning of parameterized quantum models
Asymmetrical effects of N1 and Ns
Trade-offs between N1 and Ns
Shot-noise dependent bias-variance trade-off
Classical surrogates of PQC models as probabilistic concepts
Classical approximation of PQC models
Modelling PQCs with and without link functions
...and 17 more sections

Key Result

Theorem 1

We are given a quantum observable $O$ such that $\|O\|_\infty = \Delta$. With this observable, we have quantum model whose expected output can be expressed as a classical representation as follows: $\tr(\rho(\boldsymbol{x})O) = u(\langle \boldsymbol{w}, \boldsymbol{\phi}(\boldsymbol{x}) \rangle + \x where $\epsilon_2 = \sqrt[\leftroot{-2}\uproot{2}4]{\frac{\log(\frac{1}{\delta})}{N_1}}$, $\epsilon

Figures (4)

Figure 1: Concept learning of parameterized quantum models. (a) To learn quantum models, one needs to probe the quantum model with $N$ different input data points $\boldsymbol{x}$, and construct an estimator of the quantum model $y = f(\boldsymbol{x})$ conditioned on the input. Such estimators $\bar{y}$ can be constructed by taking the average over $N_s$ duplicate quantum measurements. (b) Using data pairs ($\boldsymbol{x}_i$, $\bar{y}_i$) collected from the quantum model, the task is to classically learn a representation $h^*$ of the quantum model such that the output of classical representation $h(\boldsymbol{x})$ is close to the underlying expected output $y = f(\boldsymbol{x})$ of the quantum model for any arbitrary $\boldsymbol{x}$. As illustrated in (c), the number of measurement shots $N_S$ will determine the closeness between the estimator $\bar{y}$ (blue dots) and the underlying expected value $f(x)$ (black solid line).
Figure 2: The respective numerical illustrations of \ref{['Corollary:Asymmetry']} and \ref{['Corollary:Trade-off']} with $\delta = 0.01$, and $\bar{\sigma}=L=B=\Delta=1$. (a) The plot shows the asymmetrical effect of the number of training samples $N_1$ and the number of measurement shots $N_s$ on the explicit risk $R_{\mathrm{expl}}(h)$. (b) For a fixed total measurement budget $N_{tot}$, the optimal pair of $N_1$ and $N_s$ will change with $\gamma$. When $\gamma = 0$, the optimal shot number is $N_s = 1$ but it depends on $\gamma$ when $\gamma > 0$. All curves are computed with $N_{tot} = 600$ and $N_s = \{1,2,3,\dots,24,25\}$.
Figure 3: (a) The averaged explicit risk for different numbers of training data points $N_1$ and number of measurement shots $N_s$. The overall trends agreed with the theoretical prediction in \ref{['Fig:Theoretical-results']}: for a fixed $N_1$, the explicit risk saturated after some threshold value of $N_s$, but the explicit risk can be reduced by increasing $N_1$ regardless of the value of $N_s$. (b) When the model in $\mathcal{H}_{10}$ are presented with a sufficiently large dataset, i.e., $N_1 = 24000$, the exact function (black dashed line) can be learned even if the labels are estimated with one measurement shot. (c) Twenty different trained models (dotted dashed line of various colours) from $\mathcal{H}_{10}$ and their mean predictors (solid red line) for $N_1 = 1, 10, 100$. Increasing $N_s$ reduces the shot noise, hence reducing the spread of the trained models. (d) The bias-variance trade-off curve. The bias and variance of the trained models in (c) are calculated and plotted in the purple dotted box. The rest of the values are computed using similar procedures as per (c) for $\mathcal{H}_d$ with $d = \{ 1,2,3,4,5,6,7,8,9 \}$. Both the bias and variance decrease when $N_s$ increases, illustrating the shot-noise dependent bias-variance trade-off. (e) Bias and variance for models with and without the link function $u$. The models without the link function are more expressive, hence they are more susceptible to the shot noise, i.e., they have a higher tendency to overfit the shot noise. Increasing $N_s$ will reduce the shot noise, hence suppressing the shot-noise-induced variance. Note that the same target function is considered in all these numerical experiments.
Figure 4: The trade-off between $N_1$ and $N_s$ is considered under a fixed total measurement budget of $N_{\mathrm{tot}} = 600$ for $\gamma = \{0,1,2,3,4,5\}$ and $N_s = \{1,2,3,\dots,24,25\}$. When $N_1$ and $N_s$ are treated equally, i.e., $\gamma = 0$, the optimal pair of $N_1$ and $N_s$ is given by $(N_1^*,N_s^*) = (600,1)$. As $\gamma$ increases, more measurement shots are required, hence smaller $N_1$, to achieve better model performance. However, there will be a threshold beyond which the performance of models worsens.

Theorems & Definitions (15)

Definition 1: $p$-concept
Definition 2: $p$-concept class
Definition 3: $p$-concept learning
Theorem 1: $p$-concept learnability of PQMs
Corollary 1: Asymmetrical effects of $N_1$ and $N_s$
Corollary 2: Trade-off between $N_1$ and $N_s$
Corollary 3
Lemma 1
Lemma B.1
Lemma B.2: Vector Bernstein inequality; Lem. 18, kohler2017sub
...and 5 more

Concept learning of parameterized quantum models from limited measurements

TL;DR

Abstract

Concept learning of parameterized quantum models from limited measurements

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (15)