Table of Contents
Fetching ...

A Near-Linear Time Approximation Algorithm for Beyond-Worst-Case Graph Clustering

Vincent Cohen-Addad, Tommaso d'Orsi, Aida Mousavifar

TL;DR

This work addresses beyond-worst-case graph clustering in the semi-random model with a planted bipartition and monotone perturbations. It introduces a near-linear time algorithm that achieves an O(1)–approximation to Balanced Cut, matching the guarantees of previous SDP-based approaches but with substantially improved running time of Õ(|V|^{1+o(1)} + |E|^{1+o(1)}). The approach hinges on geometric expansion properties, approximate heavy-vertex removal, and a refined matrix multiplicative weights framework that uses probabilistic, width-bounded oracles and low-rank approximations to maintain efficiency. The results extend to related problems such as Sparsest Cut and to Dasgupta’s hierarchical clustering objective under semi-random hierarchical SBM inputs, illustrating the potential for practical robust algorithms in semi-random settings and suggesting directions for further efficiency gains in beyond-worst-case graph problems.

Abstract

We consider the semi-random graph model of [Makarychev, Makarychev and Vijayaraghavan, STOC'12], where, given a random bipartite graph with $α$ edges and an unknown bipartition $(A, B)$ of the vertex set, an adversary can add arbitrary edges inside each community and remove arbitrary edges from the cut $(A, B)$ (i.e. all adversarial changes are \textit{monotone} with respect to the bipartition). For this model, a polynomial time algorithm is known to approximate the Balanced Cut problem up to value $O(α)$ [MMV'12] as long as the cut $(A, B)$ has size $Ω(α)$. However, it consists of slow subroutines requiring optimal solutions for logarithmically many semidefinite programs. We study the fine-grained complexity of the problem and present the first near-linear time algorithm that achieves similar performances to that of [MMV'12]. Our algorithm runs in time $O(|V(G)|^{1+o(1)} + |E(G)|^{1+o(1)})$ and finds a balanced cut of value $O(α)$. Our approach appears easily extendible to related problem, such as Sparsest Cut, and also yields an near-linear time $O(1)$-approximation to Dagupta's objective function for hierarchical clustering [Dasgupta, STOC'16] for the semi-random hierarchical stochastic block model inputs of [Cohen-Addad, Kanade, Mallmann-Trenn, Mathieu, JACM'19].

A Near-Linear Time Approximation Algorithm for Beyond-Worst-Case Graph Clustering

TL;DR

This work addresses beyond-worst-case graph clustering in the semi-random model with a planted bipartition and monotone perturbations. It introduces a near-linear time algorithm that achieves an O(1)–approximation to Balanced Cut, matching the guarantees of previous SDP-based approaches but with substantially improved running time of Õ(|V|^{1+o(1)} + |E|^{1+o(1)}). The approach hinges on geometric expansion properties, approximate heavy-vertex removal, and a refined matrix multiplicative weights framework that uses probabilistic, width-bounded oracles and low-rank approximations to maintain efficiency. The results extend to related problems such as Sparsest Cut and to Dasgupta’s hierarchical clustering objective under semi-random hierarchical SBM inputs, illustrating the potential for practical robust algorithms in semi-random settings and suggesting directions for further efficiency gains in beyond-worst-case graph problems.

Abstract

We consider the semi-random graph model of [Makarychev, Makarychev and Vijayaraghavan, STOC'12], where, given a random bipartite graph with edges and an unknown bipartition of the vertex set, an adversary can add arbitrary edges inside each community and remove arbitrary edges from the cut (i.e. all adversarial changes are \textit{monotone} with respect to the bipartition). For this model, a polynomial time algorithm is known to approximate the Balanced Cut problem up to value [MMV'12] as long as the cut has size . However, it consists of slow subroutines requiring optimal solutions for logarithmically many semidefinite programs. We study the fine-grained complexity of the problem and present the first near-linear time algorithm that achieves similar performances to that of [MMV'12]. Our algorithm runs in time and finds a balanced cut of value . Our approach appears easily extendible to related problem, such as Sparsest Cut, and also yields an near-linear time -approximation to Dagupta's objective function for hierarchical clustering [Dasgupta, STOC'16] for the semi-random hierarchical stochastic block model inputs of [Cohen-Addad, Kanade, Mallmann-Trenn, Mathieu, JACM'19].
Paper Structure (17 sections, 18 theorems, 38 equations, 5 algorithms)

This paper contains 17 sections, 18 theorems, 38 equations, 5 algorithms.

Key Result

Theorem 1.2

Let $G$ be a graph over $n$ vertices generated through model:main with parameters $a>0,\eta\geqslant \Omega(\frac{(\log n)^2 \cdot (\log\log n)^2}{n})\,.$ There exists an algorithm that on input $G$, with probability $1-o(1)$, outputs an $\Omega(a)$-balanced cut of value at most $O(n^2\cdot \eta)$,

Theorems & Definitions (38)

  • Theorem 1.2
  • Theorem 1.3
  • Remark 2.1: On the minimum edge density $\eta$ in the cut
  • Definition 4.1: Heavy vertex
  • Definition 4.2: Geometric expansion
  • Theorem 4.3: Main theorem
  • Theorem 4.4: Geometric expansion of random graphs, DBLP:conf-stoc-MakarychevMV12
  • Lemma 4.6: DBLP:conf-stoc-AroraK07
  • Lemma 4.7: DBLP:conf-focs-Sherman09
  • Lemma 4.8
  • ...and 28 more