Combinatorial Approximations for Cluster Deletion: Simpler, Faster, and Better

Vicente Balmaseda; Ying Xu; Yixin Cao; Nate Veldt

Combinatorial Approximations for Cluster Deletion: Simpler, Faster, and Better

Vicente Balmaseda, Ying Xu, Yixin Cao, Nate Veldt

TL;DR

This work tackles Cluster Deletion, the NP-hard problem of deleting edges to obtain a disjoint union of cliques, by delivering simpler, faster, and stronger combinatorial algorithms that bridge theory and practice. It tightens the theoretical guarantees to a 3-approximation for both the MatchFlipPivot approach and the STC-LP rounding, while introducing a simple degree-based pivot derandomization and a fast purely combinatorial STC-LP solver that reduces to a minimum $s$-$t$ cut problem. The paper also provides faster lower bounds via maximal edge-disjoint open wedges and demonstrates scalability to graphs with millions of nodes using a Julia implementation, outperforming black-box LP solvers in practice. Collectively, these results close the theory-practice gap for Cluster Deletion by delivering deterministic, scalable methods with provable guarantees and compelling empirical performance. The work thus has practical impact for large-scale graph clustering tasks in biology and social networks, enabling reliable clique-based partitioning on datasets far larger than previously feasible.

Abstract

Cluster deletion is an NP-hard graph clustering objective with applications in computational biology and social network analysis, where the goal is to delete a minimum number of edges to partition a graph into cliques. We first provide a tighter analysis of two previous approximation algorithms, improving their approximation guarantees from 4 to 3. Moreover, we show that both algorithms can be derandomized in a surprisingly simple way, by greedily taking a vertex of maximum degree in an auxiliary graph and forming a cluster around it. One of these algorithms relies on solving a linear program. Our final contribution is to design a new and purely combinatorial approach for doing so that is far more scalable in theory and practice.

Combinatorial Approximations for Cluster Deletion: Simpler, Faster, and Better

TL;DR

cut problem. The paper also provides faster lower bounds via maximal edge-disjoint open wedges and demonstrates scalability to graphs with millions of nodes using a Julia implementation, outperforming black-box LP solvers in practice. Collectively, these results close the theory-practice gap for Cluster Deletion by delivering deterministic, scalable methods with provable guarantees and compelling empirical performance. The work thus has practical impact for large-scale graph clustering tasks in biology and social networks, enabling reliable clique-based partitioning on datasets far larger than previously feasible.

Abstract

Paper Structure (28 sections, 6 theorems, 19 equations, 5 figures, 1 table, 7 algorithms)

This paper contains 28 sections, 6 theorems, 19 equations, 5 figures, 1 table, 7 algorithms.

Introduction
Previous work.
Motivating questions.
Our contributions.
Preliminaries and Related Work
Cluster Deletion
Strong Triadic Closure Labeling
STC + Pivot Framework
Improved Approximation Analysis
Pivoting Lemma
Rounding a Disjoint Open Wedge Set
Rounding the STC LP Relaxation
Faster Algorithms for Lower Bounds
Maximal Edge-Disjoint Open Wedge Set
Combinatorial Solver for the STC LP
...and 13 more sections

Key Result

Lemma 3.1

Let $\mathcal{B}$ be the set of edges between clusters and $\mathcal{N}$ be the set of non-edges inside clusters that result from running Algorithm alg:piv. If Pivot Strategy 1 or 2 is used, then $|\mathcal{B}| \leq 2|\mathcal{N}|$. If Pivot Strategy 3 is used, this holds in expectation: $\mathbb{E}

Figures (5)

Figure 1: The example for Theorem \ref{['lem:mfp-best']}.
Figure 2: Approximation ratios ($|E_D|/|W|$) for MFP.
Figure 3: Runtimes of the MFP algorithms using different pivoting strategies. Each point represents one graph.
Figure 4: Improved approximation ratios when incorporating a cluster merging step after DegMFP.
Figure 5: Runtimes of two different solvers for the STC LP. Each point represents a graph. Points above the black dashed line indicate graphs for which the given STC LP solver did not find a solution. The two vertical dashed lines indicate the size of the largest graph (in terms of edges) for which each method was able to successfully solve the LP.

Theorems & Definitions (11)

Lemma 3.1
proof
Theorem 3.2
proof
Theorem 3.3
proof
Theorem 3.4
proof
Lemma 4.1
Lemma 4.2
...and 1 more

Combinatorial Approximations for Cluster Deletion: Simpler, Faster, and Better

TL;DR

Abstract

Combinatorial Approximations for Cluster Deletion: Simpler, Faster, and Better

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (11)