Table of Contents
Fetching ...

Robust Algorithms for Finding Cliques in Random Intersection Graphs via Sum-of-Squares

Andreas Göbel, Janosch Ruff, Leon Schiller

TL;DR

This work investigates dense random intersection graphs in which many overlapping cliques are planted, and edges outside the planted cliques are noisy. The authors develop a proofs-to-algorithms framework powered by the sum-of-squares (SoS) hierarchy to achieve exact and approximate recovery of ground-truth cliques, while proving robust identifiability via a single-label clique theorem. They show that exact recovery is possible in polynomial time against monotone adversaries when the planted clique size satisfies ${k \gg \sqrt{n\log n}}$, and they obtain near-optimal approximate recovery under up to ${\varepsilon k^2}$ edge corruptions; they also derive constant-degree SoS certificates that refute large cliques and certify the absence of extraneous large cliques. The results reveal computational-statistical and detection-recovery gaps in certain dense regimes and establish robust, certifiable guarantees for both recovery and refutation, positioning SoS as a powerful tool for overlapping community detection in high-dimensional latent-structure models. The techniques integrate balancedness certificates, neighborhood-reduction arguments, and pseudo-concentration to handle adversaries, offering a path toward certifiable recovery in complex overlapping-structure graphs with practical algorithmic implications.

Abstract

We study efficient algorithms for recovering cliques in dense random intersection graphs (RIGs). In this model, $d = n^{Ω(1)}$ cliques of size approximately $k$ are randomly planted by choosing the vertices to participate in each clique independently with probability $δ$. While there has been extensive work on recovering one, or multiple disjointly planted cliques in random graphs, the natural extension of this question to recovering overlapping cliques has been, surprisingly, largely unexplored. Moreover, because every vertex can be part of polynomially many cliques, this task is significantly harder than in case of disjointly planted cliques (as recently studied by Kothari, Vempala, Wein and Xu [COLT'23]) and manifests in the failure of simple combinatorial and even spectral algorithms. In this work we obtain the first efficient algorithms for recovering the community structure of RIGs both from the perspective of exact and approximate recovery. Our algorithms are further robust to noise, monotone adversaries, a certain, optimal number of edge corruptions, and work whenever $k \gg \sqrt{n \log(n)}$. Our techniques follow the proofs-to-algorithms framework utilizing the sum-of-squares hierarchy.

Robust Algorithms for Finding Cliques in Random Intersection Graphs via Sum-of-Squares

TL;DR

This work investigates dense random intersection graphs in which many overlapping cliques are planted, and edges outside the planted cliques are noisy. The authors develop a proofs-to-algorithms framework powered by the sum-of-squares (SoS) hierarchy to achieve exact and approximate recovery of ground-truth cliques, while proving robust identifiability via a single-label clique theorem. They show that exact recovery is possible in polynomial time against monotone adversaries when the planted clique size satisfies , and they obtain near-optimal approximate recovery under up to edge corruptions; they also derive constant-degree SoS certificates that refute large cliques and certify the absence of extraneous large cliques. The results reveal computational-statistical and detection-recovery gaps in certain dense regimes and establish robust, certifiable guarantees for both recovery and refutation, positioning SoS as a powerful tool for overlapping community detection in high-dimensional latent-structure models. The techniques integrate balancedness certificates, neighborhood-reduction arguments, and pseudo-concentration to handle adversaries, offering a path toward certifiable recovery in complex overlapping-structure graphs with practical algorithmic implications.

Abstract

We study efficient algorithms for recovering cliques in dense random intersection graphs (RIGs). In this model, cliques of size approximately are randomly planted by choosing the vertices to participate in each clique independently with probability . While there has been extensive work on recovering one, or multiple disjointly planted cliques in random graphs, the natural extension of this question to recovering overlapping cliques has been, surprisingly, largely unexplored. Moreover, because every vertex can be part of polynomially many cliques, this task is significantly harder than in case of disjointly planted cliques (as recently studied by Kothari, Vempala, Wein and Xu [COLT'23]) and manifests in the failure of simple combinatorial and even spectral algorithms. In this work we obtain the first efficient algorithms for recovering the community structure of RIGs both from the perspective of exact and approximate recovery. Our algorithms are further robust to noise, monotone adversaries, a certain, optimal number of edge corruptions, and work whenever . Our techniques follow the proofs-to-algorithms framework utilizing the sum-of-squares hierarchy.

Paper Structure

This paper contains 91 sections, 72 theorems, 202 equations, 2 figures, 2 algorithms.

Key Result

Theorem 1.3

There exists a polynomial time algorithm that on input $G \sim {\textsc{RIG}(n, d, p, q)}$, possibly modified by a monotone adversary achieves exact recovery whenever the parameters $p,d,q$ are such that $d \geqslant n^{\Omega(1)}$In this work, the notation $d \gg n^{\Omega(1)}$ means that the resul

Figures (2)

  • Figure 1: Sketch for \ref{['alg:splitting-one-sided']}: (left) The Bipartite subgraph $H(T) = G[(U\setminus S_{\ell}) \uplus V] \cap N_G(T)$ is balanced shown in \ref{['lem:general-balancedness']}. (right) Sketch of step one of the proof of identifiability for our SoS-based proof of \ref{['thm:slct']} and our refutation algorithm (\ref{['thm:algorithmicrefutation']}).
  • Figure 2: (a) Sketch of event $\mathcal{E}_{a,b}$. Vertices $S_\ell$ (yellow area) is the set of vertices with label $\ell$. The set of vertices $S$ (hatched area) is a subset of $S_{\ell}$, while no vertex of $T$ (red area) has label $\ell$. Any vertex of $T$ has all possible edges the set $S$. (b) Illustration of revealing three label revealing phases in \ref{['lem:bad-event']} 1 (yellow), 2 (red) and 3 (blue). The grey area represents a set $R$ with $b$ vertices with bounded duplicate labels in \ref{['claim:duplicates']}.

Theorems & Definitions (161)

  • Definition 1.1: Random Intersection Graphs with Noise
  • Definition 1.2: Exact and approximate recovery in RIGs
  • Theorem 1.3: Exact recovery against a monotone adversary
  • Remark 1.4: Optimality of the recovery guarantees
  • Theorem 1.5: Approximate recovery under edge corruptions
  • Remark 1.6: Optimality of the recovery guarantees
  • Theorem 1.7: Single label clique theorem
  • Theorem 2.1
  • Lemma 3.1: Bernstein's Concentration Bound
  • Lemma 3.1
  • ...and 151 more