Table of Contents
Fetching ...

Biclustering a dataset using photonic quantum computing

Ajinkya Borle, Ameya Bhave

TL;DR

The paper investigates applying photonic quantum computing—namely boson sampling and Gaussian boson sampling (GBS)—to biclustering, a problem that jointly clusters rows and columns based on a criterion such as high values or binary regularity. It develops two main approaches: a boson-sampling-based heuristic that embeds the data in a unitary and uses simulated annealing to select column subsets, and a GBS-based method that maps the data to an adjacency matrix and leverages the Autonne-Takagi decomposition to extract rows and columns simultaneously. Through four preliminary, noiseless simulations on 12×12 datasets, the study shows that both BS and GBS can identify biclusters, with GBS generally requiring fewer samples and performing well on binary data, while the effectiveness strongly depends on the contrast between bicluster values and the rest of the data. The results suggest potential utility for photonic quantum approaches in unsupervised data mining, while highlighting practical challenges such as sample efficiency, postselection, and the need for hardware demonstrations and comparisons with classical baselines. The work lays a foundation for future hybrid quantum-classical strategies and real-data experiments to assess practical impact in biclustering tasks.

Abstract

Biclustering is a problem in machine learning and data mining that seeks to group together rows and columns of a dataset according to certain criteria. In this work, we highlight the natural relation that quantum computing models like boson and Gaussian boson sampling (GBS) have to this problem. We first explore the use of boson sampling to identify biclusters based on matrix permanents. We then propose a heuristic that finds clusters in a dataset using Gaussian boson sampling by (i) converting the dataset into a bipartite graph and then (ii) running GBS to find the densest sub-graph(s) within the larger bipartite graph. Our simulations for the above proposed heuristics show promising results for future exploration in this area.

Biclustering a dataset using photonic quantum computing

TL;DR

The paper investigates applying photonic quantum computing—namely boson sampling and Gaussian boson sampling (GBS)—to biclustering, a problem that jointly clusters rows and columns based on a criterion such as high values or binary regularity. It develops two main approaches: a boson-sampling-based heuristic that embeds the data in a unitary and uses simulated annealing to select column subsets, and a GBS-based method that maps the data to an adjacency matrix and leverages the Autonne-Takagi decomposition to extract rows and columns simultaneously. Through four preliminary, noiseless simulations on 12×12 datasets, the study shows that both BS and GBS can identify biclusters, with GBS generally requiring fewer samples and performing well on binary data, while the effectiveness strongly depends on the contrast between bicluster values and the rest of the data. The results suggest potential utility for photonic quantum approaches in unsupervised data mining, while highlighting practical challenges such as sample efficiency, postselection, and the need for hardware demonstrations and comparisons with classical baselines. The work lays a foundation for future hybrid quantum-classical strategies and real-data experiments to assess practical impact in biclustering tasks.

Abstract

Biclustering is a problem in machine learning and data mining that seeks to group together rows and columns of a dataset according to certain criteria. In this work, we highlight the natural relation that quantum computing models like boson and Gaussian boson sampling (GBS) have to this problem. We first explore the use of boson sampling to identify biclusters based on matrix permanents. We then propose a heuristic that finds clusters in a dataset using Gaussian boson sampling by (i) converting the dataset into a bipartite graph and then (ii) running GBS to find the densest sub-graph(s) within the larger bipartite graph. Our simulations for the above proposed heuristics show promising results for future exploration in this area.
Paper Structure (32 sections, 14 equations, 5 figures, 6 tables, 3 algorithms)

This paper contains 32 sections, 14 equations, 5 figures, 6 tables, 3 algorithms.

Figures (5)

  • Figure 1: Examples of square biclusters in a larger square matrix, representing a dataset. (LEFT) is an example of a dataset that has elements in the range $[0,1]$ and (RIGHT) is an example of a dataset that has binary elements. In each dataset, there exist two biclusters that are distingushable by different colors.
  • Figure 2: Workflow of the boson sampling approach for biclustering. Here, the user chooses the columns and boson sampling returns corresponding rows; from which the candidate bicluster is constructed. By Eqn(\ref{['eq:probability_bs']}), it is expected that the bicluster that has the highest permanent value would also have the highest probability of being obtained (for the initial choice of columns). This figure is for illustrative purposes only.
  • Figure 3: Workflow of the Gaussian boson sampling (GBS) approach for biclustering. Here, GBS returns the rows as well as the columns of potential biclusters (ideally with large hafnian or torontonian values). This figure is for illustrative purposes only.
  • Figure 4: (RIGHT) Heatmap of the dataset used in boson sampling-problem 2. This problem was generated by taking $D^{(2)}$ from boson sampling-problem 1 whose heatmap is on the (LEFT) and then performing a random permutation on its rows and columns.
  • Figure 5: (RIGHT) Heatmap of the dataset with 3 biclusters used in GBS-problem 2. This problem was generated by first creating the dataset whose heatmap is on the (LEFT) and then performing a random permutation on its rows and columns. For further details, refer section \ref{['sec:gbs_exp2_setup']}.