Probabilistic Sampling of Balanced K-Means using Adiabatic Quantum Computing
Jan-Nico Zaech, Martin Danelljan, Tolga Birdal, Luc Van Gool
TL;DR
The paper tackles calibrated uncertainty in balanced $K$-means clustering by leveraging adiabatic quantum computing (AQC) to sample high-probability binary cluster assignments from an energy-based model. It formulates the clustering task as a QUBO with one-hot encoding and uses Lagrangian penalties to enforce balance, then relies on the D-Wave hardware to obtain samples that approximate a Boltzmann distribution; posterior calibration is performed from the measured solutions to identify ambiguous points and alternative clusterings. A probabilistic clustering framework is introduced, including a Gaussian-m mixture data model, posterior recomputation from samples, and coresets for scalability, with extensive experiments on synthetic data, IRIS, and high-dimensional image features demonstrating well-calibrated uncertainties and competitive clustering performance. The work highlights the potential of quantum sampling to enrich clustering with informative uncertainty measures and alternative solutions, while acknowledging current hardware constraints and the need for further refinements to scale and optimize problem formulations for real-world applications.
Abstract
Adiabatic quantum computing (AQC) is a promising approach for discrete and often NP-hard optimization problems. Current AQCs allow to implement problems of research interest, which has sparked the development of quantum representations for many computer vision tasks. Despite requiring multiple measurements from the noisy AQC, current approaches only utilize the best measurement, discarding information contained in the remaining ones. In this work, we explore the potential of using this information for probabilistic balanced k-means clustering. Instead of discarding non-optimal solutions, we propose to use them to compute calibrated posterior probabilities with little additional compute cost. This allows us to identify ambiguous solutions and data points, which we demonstrate on a D-Wave AQC on synthetic tasks and real visual data.
