Table of Contents
Fetching ...

Fundamental limits of community detection from multi-view data: multi-layer, dynamic and partially labeled block models

Xiaodong Yang, Buyu Lin, Subhabrata Sen

TL;DR

This work develops a unified, information-theoretic framework for community detection in multi-view networks, encompassing multilayer, dynamic, and semi-supervised SBMs. It derives the asymptotic mutual information and MMSE via a Gaussian spiked-matrix surrogate, proves universality between graph data and Gaussian models under large degrees, and establishes sharp weak-recovery thresholds for several settings. A coupled Approximate Message Passing (AMP) algorithm is proposed, with state evolution that rigorously describes its performance and a demonstration of algorithmic universality across models. The results yield actionable insights into when and how multi-view data enable reliable community detection, and they provide a principled basis for efficient, scalable inference in complex networks.

Abstract

Multi-view data arises frequently in modern network analysis e.g. relations of multiple types among individuals in social network analysis, longitudinal measurements of interactions among observational units, annotated networks with noisy partial labeling of vertices etc. We study community detection in these disparate settings via a unified theoretical framework, and investigate the fundamental thresholds for community recovery. We characterize the mutual information between the data and the latent parameters, provided the degrees are sufficiently large. Based on this general result, (i) we derive a sharp threshold for community detection in an inhomogeneous multilayer block model \citep{chen2022global}, (ii) characterize a sharp threshold for weak recovery in a dynamic stochastic block model \citep{matias2017statistical}, and (iii) identify the limiting mutual information in an unbalanced partially labeled block model. Our first two results are derived modulo coordinate-wise convexity assumptions on specific functions -- we provide extensive numerical evidence for their correctness. Finally, we introduce iterative algorithms based on Approximate Message Passing for community detection in these problems.

Fundamental limits of community detection from multi-view data: multi-layer, dynamic and partially labeled block models

TL;DR

This work develops a unified, information-theoretic framework for community detection in multi-view networks, encompassing multilayer, dynamic, and semi-supervised SBMs. It derives the asymptotic mutual information and MMSE via a Gaussian spiked-matrix surrogate, proves universality between graph data and Gaussian models under large degrees, and establishes sharp weak-recovery thresholds for several settings. A coupled Approximate Message Passing (AMP) algorithm is proposed, with state evolution that rigorously describes its performance and a demonstration of algorithmic universality across models. The results yield actionable insights into when and how multi-view data enable reliable community detection, and they provide a principled basis for efficient, scalable inference in complex networks.

Abstract

Multi-view data arises frequently in modern network analysis e.g. relations of multiple types among individuals in social network analysis, longitudinal measurements of interactions among observational units, annotated networks with noisy partial labeling of vertices etc. We study community detection in these disparate settings via a unified theoretical framework, and investigate the fundamental thresholds for community recovery. We characterize the mutual information between the data and the latent parameters, provided the degrees are sufficiently large. Based on this general result, (i) we derive a sharp threshold for community detection in an inhomogeneous multilayer block model \citep{chen2022global}, (ii) characterize a sharp threshold for weak recovery in a dynamic stochastic block model \citep{matias2017statistical}, and (iii) identify the limiting mutual information in an unbalanced partially labeled block model. Our first two results are derived modulo coordinate-wise convexity assumptions on specific functions -- we provide extensive numerical evidence for their correctness. Finally, we introduce iterative algorithms based on Approximate Message Passing for community detection in these problems.
Paper Structure (51 sections, 40 theorems, 302 equations, 9 figures, 2 algorithms)

This paper contains 51 sections, 40 theorems, 302 equations, 9 figures, 2 algorithms.

Key Result

Proposition 1.1

For any $\boldsymbol{\lambda}\in[0,\infty)^L$

Figures (9)

  • Figure 1: Illustration of effective scalar channels \ref{['eq:scalar channel']} under inter-layer priors of multilayer and dynamic SBMs respectively in Example \ref{['eg:multilayer SBM']} and \ref{['eg:dynamic SBM']}.
  • Figure 2: Illustration of Weak Recovery Thresholds \ref{['eq:threshold-inhomo-SBM']} for Multilayer Inhomogeneous SBM
  • Figure 3: This figure plots the mapping $t\in[0,2]\mapsto T_l^{\mathsf{ML},\rho}(t\boldsymbol{\gamma})$ for four different choices of $\gamma\in\mathbb{R}_+^3$ and six different choices of $\rho\in[0,1/2]$, in the context of a $3$-layer inhomogeneous SBM as Example \ref{['eg:multilayer SBM']}. Conjecture \ref{['conjecture:ML']} claims these mappings to be concave all along $t\in[0,\infty)$, and is a key condition to show the optimality of Algorithm \ref{['alg:coupled AMP']} as in Proposition \ref{['prop:ML conjectured']}.
  • Figure 4: Simulation results on an inhomogenous $2$-layer model, formally defined in Example \ref{['eg:multilayer SBM']}. Algorithm \ref{['alg:coupled AMP']} is tested on spiked matrices, sparse graphs with $d=(20,30)$ or $d=(4,6)$ respectively. Color brightness reflect global membership recovery accuracy averaged over $10$ repetitions. The empirical feasibility boundary matches well with the theoretically predicted red curve. Although our theory applies only to denser graphs with $d\rightarrow\infty$ as $n$ increases, the simulation suggests the boundary to be critical even for extremely sparse graphs: $d$ is set to $\sim5$ compared to $n=10000$.
  • Figure 5: Weak recovery thresholds of dynamic SBM when $\lambda^{(1)}=\cdots=\lambda^{(L)}$.
  • ...and 4 more figures

Theorems & Definitions (87)

  • Example 1.1: Inhomogeneous Multilayer SBM
  • Conjecture 1.1
  • Example 1.2: Dynamic SBM
  • Conjecture 1.2
  • Example 1.3: SBM with Unbalanced Partially Observed Labels
  • Proposition 1.1
  • Theorem 1.1
  • Proposition 1.2
  • Proposition 1.3
  • Remark 1.1
  • ...and 77 more