Combinatorial Approximations for Cluster Deletion: Simpler, Faster, and Better
Vicente Balmaseda, Ying Xu, Yixin Cao, Nate Veldt
TL;DR
This work tackles Cluster Deletion, the NP-hard problem of deleting edges to obtain a disjoint union of cliques, by delivering simpler, faster, and stronger combinatorial algorithms that bridge theory and practice. It tightens the theoretical guarantees to a 3-approximation for both the MatchFlipPivot approach and the STC-LP rounding, while introducing a simple degree-based pivot derandomization and a fast purely combinatorial STC-LP solver that reduces to a minimum $s$-$t$ cut problem. The paper also provides faster lower bounds via maximal edge-disjoint open wedges and demonstrates scalability to graphs with millions of nodes using a Julia implementation, outperforming black-box LP solvers in practice. Collectively, these results close the theory-practice gap for Cluster Deletion by delivering deterministic, scalable methods with provable guarantees and compelling empirical performance. The work thus has practical impact for large-scale graph clustering tasks in biology and social networks, enabling reliable clique-based partitioning on datasets far larger than previously feasible.
Abstract
Cluster deletion is an NP-hard graph clustering objective with applications in computational biology and social network analysis, where the goal is to delete a minimum number of edges to partition a graph into cliques. We first provide a tighter analysis of two previous approximation algorithms, improving their approximation guarantees from 4 to 3. Moreover, we show that both algorithms can be derandomized in a surprisingly simple way, by greedily taking a vertex of maximum degree in an auxiliary graph and forming a cluster around it. One of these algorithms relies on solving a linear program. Our final contribution is to design a new and purely combinatorial approach for doing so that is far more scalable in theory and practice.
