A Near-Linear Time Approximation Algorithm for Beyond-Worst-Case Graph Clustering
Vincent Cohen-Addad, Tommaso d'Orsi, Aida Mousavifar
TL;DR
This work addresses beyond-worst-case graph clustering in the semi-random model with a planted bipartition and monotone perturbations. It introduces a near-linear time algorithm that achieves an O(1)–approximation to Balanced Cut, matching the guarantees of previous SDP-based approaches but with substantially improved running time of Õ(|V|^{1+o(1)} + |E|^{1+o(1)}). The approach hinges on geometric expansion properties, approximate heavy-vertex removal, and a refined matrix multiplicative weights framework that uses probabilistic, width-bounded oracles and low-rank approximations to maintain efficiency. The results extend to related problems such as Sparsest Cut and to Dasgupta’s hierarchical clustering objective under semi-random hierarchical SBM inputs, illustrating the potential for practical robust algorithms in semi-random settings and suggesting directions for further efficiency gains in beyond-worst-case graph problems.
Abstract
We consider the semi-random graph model of [Makarychev, Makarychev and Vijayaraghavan, STOC'12], where, given a random bipartite graph with $α$ edges and an unknown bipartition $(A, B)$ of the vertex set, an adversary can add arbitrary edges inside each community and remove arbitrary edges from the cut $(A, B)$ (i.e. all adversarial changes are \textit{monotone} with respect to the bipartition). For this model, a polynomial time algorithm is known to approximate the Balanced Cut problem up to value $O(α)$ [MMV'12] as long as the cut $(A, B)$ has size $Ω(α)$. However, it consists of slow subroutines requiring optimal solutions for logarithmically many semidefinite programs. We study the fine-grained complexity of the problem and present the first near-linear time algorithm that achieves similar performances to that of [MMV'12]. Our algorithm runs in time $O(|V(G)|^{1+o(1)} + |E(G)|^{1+o(1)})$ and finds a balanced cut of value $O(α)$. Our approach appears easily extendible to related problem, such as Sparsest Cut, and also yields an near-linear time $O(1)$-approximation to Dagupta's objective function for hierarchical clustering [Dasgupta, STOC'16] for the semi-random hierarchical stochastic block model inputs of [Cohen-Addad, Kanade, Mallmann-Trenn, Mathieu, JACM'19].
