Table of Contents
Fetching ...

Combinatorial Approximations for Cluster Deletion: Simpler, Faster, and Better

Vicente Balmaseda, Ying Xu, Yixin Cao, Nate Veldt

TL;DR

This work tackles Cluster Deletion, the NP-hard problem of deleting edges to obtain a disjoint union of cliques, by delivering simpler, faster, and stronger combinatorial algorithms that bridge theory and practice. It tightens the theoretical guarantees to a 3-approximation for both the MatchFlipPivot approach and the STC-LP rounding, while introducing a simple degree-based pivot derandomization and a fast purely combinatorial STC-LP solver that reduces to a minimum $s$-$t$ cut problem. The paper also provides faster lower bounds via maximal edge-disjoint open wedges and demonstrates scalability to graphs with millions of nodes using a Julia implementation, outperforming black-box LP solvers in practice. Collectively, these results close the theory-practice gap for Cluster Deletion by delivering deterministic, scalable methods with provable guarantees and compelling empirical performance. The work thus has practical impact for large-scale graph clustering tasks in biology and social networks, enabling reliable clique-based partitioning on datasets far larger than previously feasible.

Abstract

Cluster deletion is an NP-hard graph clustering objective with applications in computational biology and social network analysis, where the goal is to delete a minimum number of edges to partition a graph into cliques. We first provide a tighter analysis of two previous approximation algorithms, improving their approximation guarantees from 4 to 3. Moreover, we show that both algorithms can be derandomized in a surprisingly simple way, by greedily taking a vertex of maximum degree in an auxiliary graph and forming a cluster around it. One of these algorithms relies on solving a linear program. Our final contribution is to design a new and purely combinatorial approach for doing so that is far more scalable in theory and practice.

Combinatorial Approximations for Cluster Deletion: Simpler, Faster, and Better

TL;DR

This work tackles Cluster Deletion, the NP-hard problem of deleting edges to obtain a disjoint union of cliques, by delivering simpler, faster, and stronger combinatorial algorithms that bridge theory and practice. It tightens the theoretical guarantees to a 3-approximation for both the MatchFlipPivot approach and the STC-LP rounding, while introducing a simple degree-based pivot derandomization and a fast purely combinatorial STC-LP solver that reduces to a minimum - cut problem. The paper also provides faster lower bounds via maximal edge-disjoint open wedges and demonstrates scalability to graphs with millions of nodes using a Julia implementation, outperforming black-box LP solvers in practice. Collectively, these results close the theory-practice gap for Cluster Deletion by delivering deterministic, scalable methods with provable guarantees and compelling empirical performance. The work thus has practical impact for large-scale graph clustering tasks in biology and social networks, enabling reliable clique-based partitioning on datasets far larger than previously feasible.

Abstract

Cluster deletion is an NP-hard graph clustering objective with applications in computational biology and social network analysis, where the goal is to delete a minimum number of edges to partition a graph into cliques. We first provide a tighter analysis of two previous approximation algorithms, improving their approximation guarantees from 4 to 3. Moreover, we show that both algorithms can be derandomized in a surprisingly simple way, by greedily taking a vertex of maximum degree in an auxiliary graph and forming a cluster around it. One of these algorithms relies on solving a linear program. Our final contribution is to design a new and purely combinatorial approach for doing so that is far more scalable in theory and practice.
Paper Structure (28 sections, 6 theorems, 19 equations, 5 figures, 1 table, 7 algorithms)

This paper contains 28 sections, 6 theorems, 19 equations, 5 figures, 1 table, 7 algorithms.

Key Result

Lemma 3.1

Let $\mathcal{B}$ be the set of edges between clusters and $\mathcal{N}$ be the set of non-edges inside clusters that result from running Algorithm alg:piv. If Pivot Strategy 1 or 2 is used, then $|\mathcal{B}| \leq 2|\mathcal{N}|$. If Pivot Strategy 3 is used, this holds in expectation: $\mathbb{E}

Figures (5)

  • Figure 1: The example for Theorem \ref{['lem:mfp-best']}.
  • Figure 2: Approximation ratios ($|E_D|/|W|$) for MFP.
  • Figure 3: Runtimes of the MFP algorithms using different pivoting strategies. Each point represents one graph.
  • Figure 4: Improved approximation ratios when incorporating a cluster merging step after DegMFP.
  • Figure 5: Runtimes of two different solvers for the STC LP. Each point represents a graph. Points above the black dashed line indicate graphs for which the given STC LP solver did not find a solution. The two vertical dashed lines indicate the size of the largest graph (in terms of edges) for which each method was able to successfully solve the LP.

Theorems & Definitions (11)

  • Lemma 3.1
  • proof
  • Theorem 3.2
  • proof
  • Theorem 3.3
  • proof
  • Theorem 3.4
  • proof
  • Lemma 4.1
  • Lemma 4.2
  • ...and 1 more