Table of Contents
Fetching ...

Stochastic Block Model for Hypergraphs: Statistical limits and a semidefinite programming approach

Chiheon Kim, Afonso S. Bandeira, Michel X. Goemans

TL;DR

The paper establishes a sharp information-theoretic threshold for exact recovery in the stochastic block model on $k$-uniform hypergraphs, showing that recovery is possible via ML when $I( ext{alpha}, ext{beta})>1$ and impossible when $I( ext{alpha}, ext{beta})<1$. It introduces a tractable truncate-and-relax SDP algorithm and proves exact recovery under a related, parameter-dependent threshold $I_{ ext{sdp}}( ext{alpha}, ext{beta})>1$, while also giving a complementary lower bound $I_2( ext{alpha}, ext{beta})$ governing the algorithm’s failure; simulations indicate the truncation threshold aligns with $I_2( ext{alpha}, ext{beta})=1$, suggesting a gap between statistical limits and SDP performance for larger $k$. The results generalize two-community graph SBM phase transitions to hypergraphs and quantify the trade-off between statistical limits and computationally efficient recovery methods. Overall, the work clarifies when convex relaxations can attain the information-theoretic limits and where computational gaps persist, thereby guiding future algorithm design for higher-order relational data.

Abstract

We study the problem of community detection in a random hypergraph model which we call the stochastic block model for $k$-uniform hypergraphs ($k$-SBM). We investigate the exact recovery problem in $k$-SBM and show that a sharp phase transition occurs around a threshold: below the threshold it is impossible to recover the communities with non-vanishing probability, yet above the threshold there is an estimator which recovers the communities almost asymptotically surely. We also consider a simple, efficient algorithm for the exact recovery problem which is based on a semidefinite relaxation technique.

Stochastic Block Model for Hypergraphs: Statistical limits and a semidefinite programming approach

TL;DR

The paper establishes a sharp information-theoretic threshold for exact recovery in the stochastic block model on -uniform hypergraphs, showing that recovery is possible via ML when and impossible when . It introduces a tractable truncate-and-relax SDP algorithm and proves exact recovery under a related, parameter-dependent threshold , while also giving a complementary lower bound governing the algorithm’s failure; simulations indicate the truncation threshold aligns with , suggesting a gap between statistical limits and SDP performance for larger . The results generalize two-community graph SBM phase transitions to hypergraphs and quantify the trade-off between statistical limits and computationally efficient recovery methods. Overall, the work clarifies when convex relaxations can attain the information-theoretic limits and where computational gaps persist, thereby guiding future algorithm design for higher-order relational data.

Abstract

We study the problem of community detection in a random hypergraph model which we call the stochastic block model for -uniform hypergraphs (-SBM). We investigate the exact recovery problem in -SBM and show that a sharp phase transition occurs around a threshold: below the threshold it is impossible to recover the communities with non-vanishing probability, yet above the threshold there is an estimator which recovers the communities almost asymptotically surely. We also consider a simple, efficient algorithm for the exact recovery problem which is based on a semidefinite relaxation technique.

Paper Structure

This paper contains 21 sections, 19 theorems, 182 equations, 2 figures.

Key Result

Theorem 1

Exact recovery in $\mathsf{HSBM}(n,p,q;k)$ is possible if $I(\alpha,\beta)>1$, and impossible if $I(\alpha,\beta)<1$ where $I(\alpha,\beta) = \frac{1}{2^{k-1}}(\sqrt{\alpha}-\sqrt{\beta})^2$.

Figures (2)

  • Figure 1: Visualization of $I$, $I_2$, $I_{sdp}$ when $k=6$: (a) the solid line represents $I(\alpha,\beta)=1$, (b) the circled line represents $I_2(\alpha,\beta)=1$, and (c) the x-marked line represents $I_{sdp}(\alpha,\beta)=1$. The dashed black line is the graph of $\alpha = \beta$.
  • Figure 2: Result of simulation of the truncate-and-relax algorithm for $k=6$ and $n=500$. Each gray-scale block corresponds to a pair $(\alpha,\beta)$, and its color denotes the success rate over 30 trials (black corresponds to 0 success, and brighter color correspond to higher success rate). The solid line represents $I(\alpha,\beta)=1$, the circled line represents $I_2(\alpha,\beta)=1$, and the x-marked line represents $I_{sdp}(\alpha,\beta)=1$.

Theorems & Definitions (30)

  • Definition 1
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Conjecture 1.2
  • Proposition 1
  • Theorem 4
  • proof
  • Lemma 1
  • Lemma 2
  • ...and 20 more