A Semidefinite Programming-Based Branch-and-Cut Algorithm for Biclustering
Antonio M. Sudoso
TL;DR
This work addresses the k-densest-disjoint biclique (k-DDB) biclustering problem by developing a tailored semidefinite programming (SDP) based branch-and-cut algorithm. It leverages an SDP relaxation (lifted to a rank-constrained form) with strengthening valid inequalities and a cutting-plane scheme to obtain tight upper bounds, while a rounding-based heuristic yields high-quality feasible biclusters for lower bounds. A specialized branching strategy reduces problem size by enforcing must-link and cannot-link constraints within lower-dimensional SDP subproblems. Computational results on synthetic and real-world gene-expression datasets show that the method can solve instances up to about 1248 vertices—roughly 20x larger than what general solvers can handle—highlighting the practical scalability and robustness of the approach. The solver is publicly available, enabling reproducibility and further exploration of global biclustering optimizations.
Abstract
Biclustering, also called co-clustering, block clustering, or two-way clustering, involves the simultaneous clustering of both the rows and columns of a data matrix into distinct groups, such that the rows and columns within a group display similar patterns. As a model problem for biclustering, we consider the $k$-densest-disjoint biclique problem, whose goal is to identify $k$ disjoint complete bipartite subgraphs (called bicliques) of a given weighted complete bipartite graph such that the sum of their densities is maximized. To address this problem, we present a tailored branch-and-cut algorithm. For the upper bound routine, we consider a semidefinite programming relaxation and propose valid inequalities to strengthen the bound. We solve this relaxation in a cutting-plane fashion using a first-order method. For the lower bound, we design a maximum weight matching rounding procedure that exploits the solution of the relaxation solved at each node. Computational results on both synthetic and real-world instances show that the proposed algorithm can solve instances approximately 20 times larger than those handled by general-purpose solvers.
