Quantum Clustering with k-Means: a Hybrid Approach
Alessandro Poggiali, Alessandro Berti, Anna Bernasconi, Gianna M. Del Corso, Riccardo Guidotti
TL;DR
This work tackles speeding up the cluster assignment step in $k$-Means using three hybrid quantum algorithms that perform distance computations in parallel. It introduces $q_{1:1}$-$k$-Means, $q_{1:k}$-$k$-Means, and $q_{M:k}$-$k$-Means, leveraging quantum distance circuits, Inverse Stereographic Projection for ISP-based data normalization, and FF-QRAM data loading, while analyzing post-selection and shot requirements. Empirical results on synthetic and real datasets show that, given sufficient shots, the quantum variants achieve clustering quality comparable to $oldsymbol{ au}$-$k$-Means and classical $k$-Means, though practical benefits are constrained by post-selection costs and data-loading overhead. Real hardware experiments on tiny instances confirm feasibility but reveal substantial noise and overhead, indicating that substantial advances in quantum hardware and data-loading techniques are needed to realize the practical advantages of quantum clustering.
Abstract
Quantum computing is a promising paradigm based on quantum theory for performing fast computations. Quantum algorithms are expected to surpass their classical counterparts in terms of computational complexity for certain tasks, including machine learning. In this paper, we design, implement, and evaluate three hybrid quantum k-Means algorithms, exploiting different degree of parallelism. Indeed, each algorithm incrementally leverages quantum parallelism to reduce the complexity of the cluster assignment step up to a constant cost. In particular, we exploit quantum phenomena to speed up the computation of distances. The core idea is that the computation of distances between records and centroids can be executed simultaneously, thus saving time, especially for big datasets. We show that our hybrid quantum k-Means algorithms can be more efficient than the classical version, still obtaining comparable clustering results.
