Gap-Free Clustering: Sensitivity and Robustness of SDP

Matthew Zurek; Yudong Chen

Gap-Free Clustering: Sensitivity and Robustness of SDP

Matthew Zurek, Yudong Chen

TL;DR

The paper develops a gap-free SDP framework for unbalanced SBM clustering, showing that large clusters can be exactly recovered without requiring a size gap between large and small clusters. It introduces a novel leave-one-out perturbation analysis and an improved eigenvalue perturbation bound to handle mid-size clusters near the recovery threshold, enabling robust, block-diagonal SDP solutions whose blocks are rank-zero or rank-one. The framework extends to semirandom perturbations, recursive clustering, and clustering with a faulty oracle, achieving improved sample complexities and instance-adaptive guarantees. Collectively, these contributions advance gap-free, robust clustering with practical implications for large-scale networks and adaptive data scenarios.

Abstract

We study graph clustering in the Stochastic Block Model (SBM) in the presence of both large clusters and small, unrecoverable clusters. Previous convex relaxation approaches achieving exact recovery do not allow any small clusters of size $o(\sqrt{n})$, or require a size gap between the smallest recovered cluster and the largest non-recovered cluster. We provide an algorithm based on semidefinite programming (SDP) which removes these requirements and provably recovers large clusters regardless of the remaining cluster sizes. Mid-sized clusters pose unique challenges to the analysis, since their proximity to the recovery threshold makes them highly sensitive to small noise perturbations and precludes a closed-form candidate solution. We develop novel techniques, including a leave-one-out-style argument which controls the correlation between SDP solutions and noise vectors even when the removal of one row of noise can drastically change the SDP solution. We also develop improved eigenvalue perturbation bounds of potential independent interest. Our results are robust to certain semirandom settings that are challenging for alternative algorithms. Using our gap-free clustering procedure, we obtain efficient algorithms for the problem of clustering with a faulty oracle with superior query complexities, notably achieving $o(n^2)$ sample complexity even in the presence of a large number of small clusters. Our gap-free clustering procedure also leads to improved algorithms for recursive clustering.

Gap-Free Clustering: Sensitivity and Robustness of SDP

TL;DR

Abstract

, or require a size gap between the smallest recovered cluster and the largest non-recovered cluster. We provide an algorithm based on semidefinite programming (SDP) which removes these requirements and provably recovers large clusters regardless of the remaining cluster sizes. Mid-sized clusters pose unique challenges to the analysis, since their proximity to the recovery threshold makes them highly sensitive to small noise perturbations and precludes a closed-form candidate solution. We develop novel techniques, including a leave-one-out-style argument which controls the correlation between SDP solutions and noise vectors even when the removal of one row of noise can drastically change the SDP solution. We also develop improved eigenvalue perturbation bounds of potential independent interest. Our results are robust to certain semirandom settings that are challenging for alternative algorithms. Using our gap-free clustering procedure, we obtain efficient algorithms for the problem of clustering with a faulty oracle with superior query complexities, notably achieving

sample complexity even in the presence of a large number of small clusters. Our gap-free clustering procedure also leads to improved algorithms for recursive clustering.

Paper Structure (39 sections, 47 theorems, 213 equations, 1 figure, 1 table, 4 algorithms)

This paper contains 39 sections, 47 theorems, 213 equations, 1 figure, 1 table, 4 algorithms.

Introduction
Problem Setup and Prior Art
Algorithms and Prior Results for Unbalanced SBM
Our Contributions
Semirandom Robustness
Recursive Clustering
Clustering With a Faulty Oracle
Eigenvalue Perturbation Bounds
Notation
Proof Outline of Main Theorem
Setting up primal-dual witness argument
Analyzing oracle SDPs
Showing relaxed oracle SDP has rank-one solution
Setting up leave-one-out technique to control noise correlation
Showing leave-one-out solution is close to relaxed oracle SDP solution
...and 24 more sections

Key Result

Theorem 1

If two consecutive cluster sizes $\overline{s} > \underline{s}$ satisfying then with high probability the solution to the recovery SDP eq:SDP_regularized_original with a suitable $\lambda$ is of the form That is, all big clusters are exactly recovered and all small clusters are completely ignored.

Figures (1)

Figure 1: SDP solutions and effects of regularization $\lambda$. (a): Ground truth clusters of sizes 300, 150, and 50 ($\times 5$). (b)&(c): Solutions to SDP \ref{['eq:SDP_regularized_original']} with different $\lambda$. Nonzero off-block-diagonal entries are highlighted in pink; other entries shown in grayscale ($\text{white}=0,\text{black}=1$).

Theorems & Definitions (86)

Theorem 1: Informal version of Theorem \ref{['thm:clustering_with_a_gap']}
Theorem 2
Remark 3
Theorem 4
Theorem 5: Informal version of Theorem \ref{['thm:recursive_clsutering']}
Theorem 6
Theorem 7
Theorem 8
Theorem 9
Theorem 10
...and 76 more

Gap-Free Clustering: Sensitivity and Robustness of SDP

TL;DR

Abstract

Gap-Free Clustering: Sensitivity and Robustness of SDP

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (86)