Table of Contents
Fetching ...

Fundamental Limits of Community Detection in Contextual Multi-Layer Stochastic Block Models

Shuyang Gong, Dong Huang, Zhangsong Li

TL;DR

This work studies community detection from a joint covariate matrix and multiple sparse graphs in the sparse, constant-degree regime, establishing sharp information-theoretic thresholds that govern detectability and label estimation. It introduces a unified contextual multi-layer SBM, derives a threshold function $F$ whose regime determines feasibility, and shows there is no statistical–computational gap for fixed numbers of layers. On the algorithmic side, it designs efficient detectors and weak-recovery estimators based on counting decorated cycles and decorated paths, with color-coding to ensure polynomial-time implementations, and proves these achieve the sharp thresholds. The authors also develop a rigorous information-theoretic lower bound framework, including a novel Bernoulli–Gaussian moment comparison and a recovery-to-detection reduction, and corroborate the theory with numerical experiments in sparse multi-layer settings. Overall, the paper advances understanding of multi-modal network inference in realistic sparse and noisy contexts and provides practical, threshold-achieving algorithms for joint contextual detection and recovery.

Abstract

We consider the problem of community detection from the joint observation of a high-dimensional covariate matrix and $L$ sparse networks, all encoding noisy, partial information about the latent community labels of $n$ subjects. In the asymptotic regime where the networks have constant average degree and the number of features $p$ grows proportionally with $n$, we derive a sharp threshold under which detecting and estimating the subject labels is possible. Our results extend the work of \cite{MN23} to the constant-degree regime with noisy measurements, and also resolve a conjecture in \cite{YLS24+} when the number of networks is a constant. Our information-theoretic lower bound is obtained via a novel comparison inequality between Bernoulli and Gaussian moments, as well as a statistical variant of the ``recovery to chi-square divergence reduction'' argument inspired by \cite{DHSS25}. On the algorithmic side, we design efficient algorithms based on counting decorated cycles and decorated paths and prove that they achieve the sharp threshold for both detection and weak recovery. In particular, our results show that there is no statistical-computational gap in this setting.

Fundamental Limits of Community Detection in Contextual Multi-Layer Stochastic Block Models

TL;DR

This work studies community detection from a joint covariate matrix and multiple sparse graphs in the sparse, constant-degree regime, establishing sharp information-theoretic thresholds that govern detectability and label estimation. It introduces a unified contextual multi-layer SBM, derives a threshold function whose regime determines feasibility, and shows there is no statistical–computational gap for fixed numbers of layers. On the algorithmic side, it designs efficient detectors and weak-recovery estimators based on counting decorated cycles and decorated paths, with color-coding to ensure polynomial-time implementations, and proves these achieve the sharp thresholds. The authors also develop a rigorous information-theoretic lower bound framework, including a novel Bernoulli–Gaussian moment comparison and a recovery-to-detection reduction, and corroborate the theory with numerical experiments in sparse multi-layer settings. Overall, the paper advances understanding of multi-modal network inference in realistic sparse and noisy contexts and provides practical, threshold-achieving algorithms for joint contextual detection and recovery.

Abstract

We consider the problem of community detection from the joint observation of a high-dimensional covariate matrix and sparse networks, all encoding noisy, partial information about the latent community labels of subjects. In the asymptotic regime where the networks have constant average degree and the number of features grows proportionally with , we derive a sharp threshold under which detecting and estimating the subject labels is possible. Our results extend the work of \cite{MN23} to the constant-degree regime with noisy measurements, and also resolve a conjecture in \cite{YLS24+} when the number of networks is a constant. Our information-theoretic lower bound is obtained via a novel comparison inequality between Bernoulli and Gaussian moments, as well as a statistical variant of the ``recovery to chi-square divergence reduction'' argument inspired by \cite{DHSS25}. On the algorithmic side, we design efficient algorithms based on counting decorated cycles and decorated paths and prove that they achieve the sharp threshold for both detection and weak recovery. In particular, our results show that there is no statistical-computational gap in this setting.
Paper Structure (28 sections, 24 theorems, 231 equations, 5 figures)

This paper contains 28 sections, 24 theorems, 231 equations, 5 figures.

Key Result

Theorem 1.5

Suppose that $L=O(1)$ and $F(\mu,\rho,\gamma,\{ \lambda_{\ell} \},\{ \epsilon_{\ell} \})<1$. Then strong detection and weak recovery are information-theoretically impossible. On the contrary, suppose that $L=O(1)$ and $F(\mu,\rho,\gamma,\{ \lambda_{\ell} \},\{ \epsilon_{\ell} \})>1$. Then there exis

Figures (5)

  • Figure 1: Proof outline for the impossibility of weak recovery
  • Figure 2: Decorated cycle and path: disks denote vertices in $V^{\mathsf a}(H)$, squares denote vertices in $V^{\mathsf b}(H)$, and the edge colors (among $L$ total colors) indicate which graph $\bm{G}_\ell$ the edge is drawn from, $\ell \in [L]$.
  • Figure 3: An example of decorated paths $S$ and $K$ with $E(S)\cap E(K) = E(S\Cap K)$, together with $S_{\cap}$, $S_{\setminus}$, and $K_{\setminus}$, for $\mathtt T=2$.
  • Figure 4: ROC curves for the contextual $2$-layer SBMs with $n=100$, $p=50$, $\gamma=2$ and $F(\mu,p,\gamma;\{\lambda_\ell\},\{\epsilon_\ell\})\in\{0.75,1.25,1.75, 2.25\}$.
  • Figure 5: Cosine similarity $\frac{|\langle\hat{\Phi},\bm x\bm x^\top \rangle|}{\|\hat{\Phi} \|_{\operatorname{F}}\|\bm x\bm x^\top\|_{\operatorname{F}}}$ with $n=100$, $p=50$, $L=2$ and $F(\mu,p,\gamma;\{\lambda_\ell\},\{\epsilon_\ell\})\in\{0.3,0.5,0.7,0.9,1.1,1.3,1.5,1.7\}$.

Theorems & Definitions (49)

  • Definition 1.1: Stochastic block model
  • Definition 1.2: Contextual multi-layer SBM
  • Definition 1.3
  • Definition 1.4
  • Theorem 1.5
  • Remark 1.6
  • Remark 1.7
  • Remark 1.8
  • Proposition 2.1
  • Remark 2.2
  • ...and 39 more