A Near-Linear Time Approximation Algorithm for Beyond-Worst-Case Graph Clustering

Vincent Cohen-Addad; Tommaso d'Orsi; Aida Mousavifar

A Near-Linear Time Approximation Algorithm for Beyond-Worst-Case Graph Clustering

Vincent Cohen-Addad, Tommaso d'Orsi, Aida Mousavifar

TL;DR

This work addresses beyond-worst-case graph clustering in the semi-random model with a planted bipartition and monotone perturbations. It introduces a near-linear time algorithm that achieves an O(1)–approximation to Balanced Cut, matching the guarantees of previous SDP-based approaches but with substantially improved running time of Õ(|V|^{1+o(1)} + |E|^{1+o(1)}). The approach hinges on geometric expansion properties, approximate heavy-vertex removal, and a refined matrix multiplicative weights framework that uses probabilistic, width-bounded oracles and low-rank approximations to maintain efficiency. The results extend to related problems such as Sparsest Cut and to Dasgupta’s hierarchical clustering objective under semi-random hierarchical SBM inputs, illustrating the potential for practical robust algorithms in semi-random settings and suggesting directions for further efficiency gains in beyond-worst-case graph problems.

Abstract

We consider the semi-random graph model of [Makarychev, Makarychev and Vijayaraghavan, STOC'12], where, given a random bipartite graph with $α$ edges and an unknown bipartition $(A, B)$ of the vertex set, an adversary can add arbitrary edges inside each community and remove arbitrary edges from the cut $(A, B)$ (i.e. all adversarial changes are \textit{monotone} with respect to the bipartition). For this model, a polynomial time algorithm is known to approximate the Balanced Cut problem up to value $O(α)$ [MMV'12] as long as the cut $(A, B)$ has size $Ω(α)$. However, it consists of slow subroutines requiring optimal solutions for logarithmically many semidefinite programs. We study the fine-grained complexity of the problem and present the first near-linear time algorithm that achieves similar performances to that of [MMV'12]. Our algorithm runs in time $O(|V(G)|^{1+o(1)} + |E(G)|^{1+o(1)})$ and finds a balanced cut of value $O(α)$. Our approach appears easily extendible to related problem, such as Sparsest Cut, and also yields an near-linear time $O(1)$-approximation to Dagupta's objective function for hierarchical clustering [Dasgupta, STOC'16] for the semi-random hierarchical stochastic block model inputs of [Cohen-Addad, Kanade, Mallmann-Trenn, Mathieu, JACM'19].

A Near-Linear Time Approximation Algorithm for Beyond-Worst-Case Graph Clustering

TL;DR

Abstract

We consider the semi-random graph model of [Makarychev, Makarychev and Vijayaraghavan, STOC'12], where, given a random bipartite graph with

edges and an unknown bipartition

of the vertex set, an adversary can add arbitrary edges inside each community and remove arbitrary edges from the cut

(i.e. all adversarial changes are \textit{monotone} with respect to the bipartition). For this model, a polynomial time algorithm is known to approximate the Balanced Cut problem up to value

[MMV'12] as long as the cut

has size

. However, it consists of slow subroutines requiring optimal solutions for logarithmically many semidefinite programs. We study the fine-grained complexity of the problem and present the first near-linear time algorithm that achieves similar performances to that of [MMV'12]. Our algorithm runs in time

and finds a balanced cut of value

. Our approach appears easily extendible to related problem, such as Sparsest Cut, and also yields an near-linear time

-approximation to Dagupta's objective function for hierarchical clustering [Dasgupta, STOC'16] for the semi-random hierarchical stochastic block model inputs of [Cohen-Addad, Kanade, Mallmann-Trenn, Mathieu, JACM'19].

Paper Structure (17 sections, 18 theorems, 38 equations, 5 algorithms)

This paper contains 17 sections, 18 theorems, 38 equations, 5 algorithms.

Introduction
Results
Related Research
Techniques
Perspective
Organization and notation
A fast algorithm for semi-random balanced cut
The algorithm
Background
The matrix multiplicative weights method for SDPs
Approximate matrix exponentiation, robust and reliable oracles
The heavy vertices removal oracle
The fast heavy vertices removal procedure
The oracle
The semi-random hierarchical stochastic model
...and 2 more sections

Key Result

Theorem 1.2

Let $G$ be a graph over $n$ vertices generated through model:main with parameters $a>0,\eta\geqslant \Omega(\frac{(\log n)^2 \cdot (\log\log n)^2}{n})\,.$ There exists an algorithm that on input $G$, with probability $1-o(1)$, outputs an $\Omega(a)$-balanced cut of value at most $O(n^2\cdot \eta)$,

Theorems & Definitions (38)

Theorem 1.2
Theorem 1.3
Remark 2.1: On the minimum edge density $\eta$ in the cut
Definition 4.1: Heavy vertex
Definition 4.2: Geometric expansion
Theorem 4.3: Main theorem
Theorem 4.4: Geometric expansion of random graphs, DBLP:conf-stoc-MakarychevMV12
Lemma 4.6: DBLP:conf-stoc-AroraK07
Lemma 4.7: DBLP:conf-focs-Sherman09
Lemma 4.8
...and 28 more

A Near-Linear Time Approximation Algorithm for Beyond-Worst-Case Graph Clustering

TL;DR

Abstract

A Near-Linear Time Approximation Algorithm for Beyond-Worst-Case Graph Clustering

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (38)