Learning Reduced Representations for Quantum Classifiers
Patrick Odagiu, Vasilis Belis, Lennart Schulze, Panagiotis Barkoutsos, Michele Grossi, Florentin Reiter, Günther Dissertori, Ivano Tavernelli, Sofia Vallecorsa
TL;DR
The paper tackles the obstacle of high‑dimensional data for quantum ML by systematically benchmarking conventional dimensionality reduction and autoencoder methods on a 67‑feature HEP dataset, then evaluating their impact on a quantum support vector machine. It demonstrates that autoencoder‑based reductions, particularly the Sinkclass architecture that couples an encoder to a classifier with Sinkhorn regularization, produce more discriminative latent spaces than traditional methods, achieving a QSVM AUC around 0.73–0.74 on the ttH(bb) task. The study provides a practical recipe for applying dimensionality reduction in QML, showing how to balance reconstruction with downstream classification performance and offering public data and code for reproducibility. Overall, the work broadens the applicability of quantum classifiers to high‑dimensional scientific data and guides future QML pipelines in choosing effective reduction strategies.
Abstract
Data sets that are specified by a large number of features are currently outside the area of applicability for quantum machine learning algorithms. An immediate solution to this impasse is the application of dimensionality reduction methods before passing the data to the quantum algorithm. We investigate six conventional feature extraction algorithms and five autoencoder-based dimensionality reduction models to a particle physics data set with 67 features. The reduced representations generated by these models are then used to train a quantum support vector machine for solving a binary classification problem: whether a Higgs boson is produced in proton collisions at the LHC. We show that the autoencoder methods learn a better lower-dimensional representation of the data, with the method we design, the Sinkclass autoencoder, performing 40% better than the baseline. The methods developed here open up the applicability of quantum machine learning to a larger array of data sets. Moreover, we provide a recipe for effective dimensionality reduction in this context.
