Biclustering a dataset using photonic quantum computing
Ajinkya Borle, Ameya Bhave
TL;DR
The paper investigates applying photonic quantum computing—namely boson sampling and Gaussian boson sampling (GBS)—to biclustering, a problem that jointly clusters rows and columns based on a criterion such as high values or binary regularity. It develops two main approaches: a boson-sampling-based heuristic that embeds the data in a unitary and uses simulated annealing to select column subsets, and a GBS-based method that maps the data to an adjacency matrix and leverages the Autonne-Takagi decomposition to extract rows and columns simultaneously. Through four preliminary, noiseless simulations on 12×12 datasets, the study shows that both BS and GBS can identify biclusters, with GBS generally requiring fewer samples and performing well on binary data, while the effectiveness strongly depends on the contrast between bicluster values and the rest of the data. The results suggest potential utility for photonic quantum approaches in unsupervised data mining, while highlighting practical challenges such as sample efficiency, postselection, and the need for hardware demonstrations and comparisons with classical baselines. The work lays a foundation for future hybrid quantum-classical strategies and real-data experiments to assess practical impact in biclustering tasks.
Abstract
Biclustering is a problem in machine learning and data mining that seeks to group together rows and columns of a dataset according to certain criteria. In this work, we highlight the natural relation that quantum computing models like boson and Gaussian boson sampling (GBS) have to this problem. We first explore the use of boson sampling to identify biclusters based on matrix permanents. We then propose a heuristic that finds clusters in a dataset using Gaussian boson sampling by (i) converting the dataset into a bipartite graph and then (ii) running GBS to find the densest sub-graph(s) within the larger bipartite graph. Our simulations for the above proposed heuristics show promising results for future exploration in this area.
