Table of Contents
Fetching ...

Exact Label Recovery in Euclidean Random Graphs

Julia Gaudio, Charlie Guan, Xiaochun Niu, Ermin Wei

TL;DR

This work introduces the Geometric Hidden Community Model (GHCM), a geometric extension of classic network inference problems, to study exact label recovery in Euclidean random graphs. It characterizes a sharp information-theoretic threshold based on Chernoff–Hellinger divergence $D_+( heta_i,\theta_j)$, below which exact recovery is impossible and above which a linear-time, two-phase algorithm achieves exact recovery: Phase I yields an almost-exact labeling via local propagation on a block-structured torus, and Phase II refines this labeling to exact recovery using a genie-aided, maximum-likelihood approach. The framework subsumes notable models like the Geometric Stochastic Block Model (GSBM), geometric $\mathbb{Z}_2$ synchronization, and geometric submatrix localization, with the threshold driven by the local-to-global amplification phenomenon. The algorithm runs in $O(n \log n)$ time and remains robust to monotone adversaries in the two-community setting, leveraging dense local structure and a connectivity-guaranteed visibility graph. The results advance understanding of when geometry enables efficient exact recovery and open directions for relaxing distinctness assumptions and exploring broader regimes and perturbations in geometric inference.

Abstract

In this paper, we propose a family of label recovery problems on weighted Euclidean random graphs. The vertices of a graph are embedded in $\mathbb{R}^d$ according to a Poisson point process, and are assigned to a discrete community label. Our goal is to infer the vertex labels, given edge weights whose distributions depend on the vertex labels as well as their geometric positions. Our general model provides a geometric extension of popular graph and matrix problems, including submatrix localization and $\mathbb{Z}_2$-synchronization, and includes the Geometric Stochastic Block Model (proposed by Sankararaman and Baccelli) as a special case. We study the fundamental limits of exact recovery of the vertex labels. Under a mild distinctness of distributions assumption, we determine the information-theoretic threshold for exact label recovery, in terms of a Chernoff-Hellinger divergence criterion. Impossibility of recovery below the threshold is proven by a unified analysis using a Cramér lower bound. Achievability above the threshold is proven via an efficient two-phase algorithm, where the first phase computes an almost-exact labeling through a local propagation scheme, while the second phase refines the labels. The information-theoretic threshold is dictated by the performance of the so-called genie estimator, which decodes the label of a single vertex given all the other labels. This shows that our proposed models exhibit the local-to-global amplification phenomenon.

Exact Label Recovery in Euclidean Random Graphs

TL;DR

This work introduces the Geometric Hidden Community Model (GHCM), a geometric extension of classic network inference problems, to study exact label recovery in Euclidean random graphs. It characterizes a sharp information-theoretic threshold based on Chernoff–Hellinger divergence , below which exact recovery is impossible and above which a linear-time, two-phase algorithm achieves exact recovery: Phase I yields an almost-exact labeling via local propagation on a block-structured torus, and Phase II refines this labeling to exact recovery using a genie-aided, maximum-likelihood approach. The framework subsumes notable models like the Geometric Stochastic Block Model (GSBM), geometric synchronization, and geometric submatrix localization, with the threshold driven by the local-to-global amplification phenomenon. The algorithm runs in time and remains robust to monotone adversaries in the two-community setting, leveraging dense local structure and a connectivity-guaranteed visibility graph. The results advance understanding of when geometry enables efficient exact recovery and open directions for relaxing distinctness assumptions and exploring broader regimes and perturbations in geometric inference.

Abstract

In this paper, we propose a family of label recovery problems on weighted Euclidean random graphs. The vertices of a graph are embedded in according to a Poisson point process, and are assigned to a discrete community label. Our goal is to infer the vertex labels, given edge weights whose distributions depend on the vertex labels as well as their geometric positions. Our general model provides a geometric extension of popular graph and matrix problems, including submatrix localization and -synchronization, and includes the Geometric Stochastic Block Model (proposed by Sankararaman and Baccelli) as a special case. We study the fundamental limits of exact recovery of the vertex labels. Under a mild distinctness of distributions assumption, we determine the information-theoretic threshold for exact label recovery, in terms of a Chernoff-Hellinger divergence criterion. Impossibility of recovery below the threshold is proven by a unified analysis using a Cramér lower bound. Achievability above the threshold is proven via an efficient two-phase algorithm, where the first phase computes an almost-exact labeling through a local propagation scheme, while the second phase refines the labels. The information-theoretic threshold is dictated by the performance of the so-called genie estimator, which decodes the label of a single vertex given all the other labels. This shows that our proposed models exhibit the local-to-global amplification phenomenon.
Paper Structure (38 sections, 50 theorems, 165 equations, 3 figures, 7 algorithms)

This paper contains 38 sections, 50 theorems, 165 equations, 3 figures, 7 algorithms.

Key Result

Theorem 2.1

Any estimator fails to achieve exact recovery for $G\sim \text{GHCM}(\lambda, n, \pi, P, d)$ if

Figures (3)

  • Figure 1: Propagation schedule for $d=1$ and $d=2$.
  • Figure 2: Geometry around block $B_i$, showing a portion of the region $\mathcal{S}_{2,n}$. The set $U_i$ is comprised of dark and light blue blocks.
  • Figure 3: Possible isolated components in $\mathcal{S}_{2,n}$ for Proposition \ref{['lemma:connectivity']}.

Theorems & Definitions (92)

  • Definition 2.1: Geometric Hidden Community Model
  • Definition 2.2: Permissible relabeling
  • Theorem 2.1: Impossibility
  • Conjecture 2.2
  • Theorem 2.3: Achievability
  • Theorem 2.4
  • Theorem 2.5: Theorem 3.7 in Abbe2021
  • Corollary 2.6: Two-community symmetric GSBM
  • Corollary 2.7: General GSBM
  • Proposition 2.8: Geometric $\mathbb{Z}_2$ synchronization
  • ...and 82 more