Table of Contents
Fetching ...

The role of data-induced randomness in quantum machine learning classification tasks

Berta Casas, Xavier Bonet-Monroig, Adrián Pérez-Salinas

TL;DR

This work analyzes how data embedding choices influence quantum machine learning for binary classification by introducing class margin, a metric that links data-induced randomness to classification accuracy via shadowed observable moments. It proves that when embedded states resemble Haar randomness, classification performance is fundamentally limited, and demonstrates this through a Discrete Logarithm Problem–based example, an observable-bias study, and a comparison of feature-map versus data re-uploading models. The results show that avoiding Haar-like distributions in embeddings and carefully selecting observables are crucial for practical QML, with class margin offering a diagnostic tool to assess and guide embedding design. Overall, the paper provides analytical bounds and practical insights that connect averaging randomness, design theory, and generalization considerations to the viability of QML classifiers on near-term devices.

Abstract

Quantum machine learning (QML) has surged as a prominent area of research with the objective to go beyond the capabilities of classical machine learning models. A critical aspect of any learning task is the process of data embedding, which directly impacts model performance. Poorly designed data-embedding strategies can significantly impact the success of a learning task. Despite its importance, rigorous analyses of data-embedding effects are limited, leaving many cases without effective assessment methods. In this work, we introduce a metric for binary classification tasks, the class margin, by merging the concepts of average randomness and classification margin. This metric analytically connects data-induced randomness with classification accuracy for a given data-embedding map. We benchmark a range of data-embedding strategies through class margin, demonstrating that data-induced randomness imposes a limit on classification performance. We expect this work to provide a new approach to evaluate QML models by their data-embedding processes, addressing gaps left by existing analytical tools.

The role of data-induced randomness in quantum machine learning classification tasks

TL;DR

This work analyzes how data embedding choices influence quantum machine learning for binary classification by introducing class margin, a metric that links data-induced randomness to classification accuracy via shadowed observable moments. It proves that when embedded states resemble Haar randomness, classification performance is fundamentally limited, and demonstrates this through a Discrete Logarithm Problem–based example, an observable-bias study, and a comparison of feature-map versus data re-uploading models. The results show that avoiding Haar-like distributions in embeddings and carefully selecting observables are crucial for practical QML, with class margin offering a diagnostic tool to assess and guide embedding design. Overall, the paper provides analytical bounds and practical insights that connect averaging randomness, design theory, and generalization considerations to the viability of QML classifiers on near-term devices.

Abstract

Quantum machine learning (QML) has surged as a prominent area of research with the objective to go beyond the capabilities of classical machine learning models. A critical aspect of any learning task is the process of data embedding, which directly impacts model performance. Poorly designed data-embedding strategies can significantly impact the success of a learning task. Despite its importance, rigorous analyses of data-embedding effects are limited, leaving many cases without effective assessment methods. In this work, we introduce a metric for binary classification tasks, the class margin, by merging the concepts of average randomness and classification margin. This metric analytically connects data-induced randomness with classification accuracy for a given data-embedding map. We benchmark a range of data-embedding strategies through class margin, demonstrating that data-induced randomness imposes a limit on classification performance. We expect this work to provide a new approach to evaluate QML models by their data-embedding processes, addressing gaps left by existing analytical tools.

Paper Structure

This paper contains 20 sections, 15 theorems, 123 equations, 6 figures.

Key Result

Lemma 1

Consider the class margin $z(\boldsymbol{x})$ for a given data point $\boldsymbol{x}$. Suppose the classifier performs $M$ independent measurements of $z(\boldsymbol{x})$ for this data point. Then, for the classifier to correctly classify $\boldsymbol{x}$ with probability at least $1 - \delta$, it s where $b$ is the decision threshold.

Figures (6)

  • Figure 1: Graphical interpretation of (a) tunable decision boundaries and (b) tunable embedding kernels. In feature-map models, optimization can only provide the optimal observable, and performance is upper bounded by the feature map. Re-uploading models are capable of optimizing the data embedding to perform classification over arbitrary data sets.
  • Figure 2: Illustration of the classification criteria and the definition of the class margin $z(\boldsymbol{x})$. In both plots, the gray window indicates the region of data points $\mathcal{O} (M^{-1/2})$ that lie so close to the decision boundary that they cannot be resolved without requiring exponentially many resources as $n$ increases. $(a)$ Example of an expected value histogram for a binary classification problem. The yellow dashed line represents the misclassified points based on the criteria defined in the text. $(b)$ In this plot, data points with $z(\boldsymbol{x}) > b$ are misclassified. We also depict the distance to the boundary, $b-z(\boldsymbol{x})$, using a dashed line.
  • Figure 3: Numerical estimation of the anti-randomness $\mathcal{A}_t(S, \hat{O}_Z)$ normalized with respect to $\hat{\mu_t}(\hat{O}_Z)$ and averaged over the set of states $\mathcal{X}_{\mathcal{B}, \mathcal{D}}$ defined in \ref{['eq:set_of_states_counterexample']} for $n=8$. The shaded areas represent the error bars. For the number of necessary samples needed to distinguish if the distribution is a $\hat{O_Z}-$shadowed $t$-design, see Reference bonet-monroig2024verifying. In this case, we tolerate an error of $\epsilon = 0.07$. For the permutation samples, we use $M_\Pi = 2n$. The green line corresponds to the moment computed with the original observable. The other lines correspond to the moments computed with the observable permuted $1, 5$ and $15$ times, respectively. When, no permutations are applied to the observable, $\mathcal{A}_t(S, \hat{O}_Z)$ is close to zero. Naively, one would interpret this result as the set of states $\mathcal{X}_{\mathcal{B}, \mathcal{D}}$ being an $\hat{O}_Z-$shadowed $t$-design. However, this is far from correct. As soon as one applies permutations to the observable, the average randomness deviates from zero, and therefore, the family of states is not actually Haar-randomly distributed.
  • Figure 4: Numerical computations for the statistical moments $\mu_1(z_{\boldsymbol{\theta}}(\boldsymbol{x}))$, $\sigma^2(z_{\boldsymbol{\theta}}(\boldsymbol{x}))$ for feature-map variational QML models, as a function of the number of layers. Results are shown for both the brick and non-brick ansatzes (see Appendix \ref{['Appendix:learning_problem']} for details on the circuit). The first row shows the mean and variance over the training set, using optimized parameters obtained via L-BFGS-B. The absence of error bars in these figures is due to the fact that we use optimal parameters and a fixed data set, thus the statistical moments can be computed exactly. The second row displays the mean and variance over the test set sampled from the data distribution. In the third row, parameters $\boldsymbol{\theta}$ are sampled randomly from a uniform distribution. The statistical moments are computed via Monte Carlo sampling, and the shaded areas represent the error bars. $(a)$ Mean shifted to $1/2$ and $(b)$ variance of $z_{\boldsymbol{\theta}}(\boldsymbol{x})$ for the brick feature map classifier. Mean shifted to $1/2$ and $(b)$ variance of $z_{\boldsymbol{\theta}}(\boldsymbol{x})$ for the non-brick feature map classifier.
  • Figure 5: Numerical computations for the statistical moments $\mu_1(z_{\boldsymbol{\theta}}(\boldsymbol{x}))$, $\sigma^2(z_{\boldsymbol{\theta}}(\boldsymbol{x}))$ for data re-uploading model, as a function of the number of layers. The first row shows the mean and variance over the training set, using optimized parameters obtained via L-BFGS-B. In these plots, error bars are absent because the training set is equispaced. In this case, Monte Carlo error does not apply, as we are not sampling from a random distribution. The second row displays the mean and variance over the test set sampled from the data distribution. In the third row, parameters $\boldsymbol{\theta}$ are sampled randomly from a uniform distribution. The statistical moments are computed via Monte Carlo sampling, and the shaded areas represent the error bars. $(a)$ Mean of $z_{\boldsymbol{\theta}}(\boldsymbol{x})$ shifted to $1/2$. $(b)$ Variance of $z_{\boldsymbol{\theta}}(\boldsymbol{x})$.
  • ...and 1 more figures

Theorems & Definitions (22)

  • Definition 1: (Spherical) $t$-designs DELSARTE1976230ambainis2007quantum
  • Definition 2: $\hat{O}$-shadowed $t$-design
  • Definition 3: Class margin
  • Lemma 1
  • Theorem 1
  • Corollary 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Theorem 2
  • ...and 12 more