The role of data-induced randomness in quantum machine learning classification tasks
Berta Casas, Xavier Bonet-Monroig, Adrián Pérez-Salinas
TL;DR
This work analyzes how data embedding choices influence quantum machine learning for binary classification by introducing class margin, a metric that links data-induced randomness to classification accuracy via shadowed observable moments. It proves that when embedded states resemble Haar randomness, classification performance is fundamentally limited, and demonstrates this through a Discrete Logarithm Problem–based example, an observable-bias study, and a comparison of feature-map versus data re-uploading models. The results show that avoiding Haar-like distributions in embeddings and carefully selecting observables are crucial for practical QML, with class margin offering a diagnostic tool to assess and guide embedding design. Overall, the paper provides analytical bounds and practical insights that connect averaging randomness, design theory, and generalization considerations to the viability of QML classifiers on near-term devices.
Abstract
Quantum machine learning (QML) has surged as a prominent area of research with the objective to go beyond the capabilities of classical machine learning models. A critical aspect of any learning task is the process of data embedding, which directly impacts model performance. Poorly designed data-embedding strategies can significantly impact the success of a learning task. Despite its importance, rigorous analyses of data-embedding effects are limited, leaving many cases without effective assessment methods. In this work, we introduce a metric for binary classification tasks, the class margin, by merging the concepts of average randomness and classification margin. This metric analytically connects data-induced randomness with classification accuracy for a given data-embedding map. We benchmark a range of data-embedding strategies through class margin, demonstrating that data-induced randomness imposes a limit on classification performance. We expect this work to provide a new approach to evaluate QML models by their data-embedding processes, addressing gaps left by existing analytical tools.
