Exact Label Recovery in Euclidean Random Graphs

Julia Gaudio; Charlie Guan; Xiaochun Niu; Ermin Wei

Exact Label Recovery in Euclidean Random Graphs

Julia Gaudio, Charlie Guan, Xiaochun Niu, Ermin Wei

TL;DR

This work introduces the Geometric Hidden Community Model (GHCM), a geometric extension of classic network inference problems, to study exact label recovery in Euclidean random graphs. It characterizes a sharp information-theoretic threshold based on Chernoff–Hellinger divergence $D_+( heta_i,\theta_j)$, below which exact recovery is impossible and above which a linear-time, two-phase algorithm achieves exact recovery: Phase I yields an almost-exact labeling via local propagation on a block-structured torus, and Phase II refines this labeling to exact recovery using a genie-aided, maximum-likelihood approach. The framework subsumes notable models like the Geometric Stochastic Block Model (GSBM), geometric $\mathbb{Z}_2$ synchronization, and geometric submatrix localization, with the threshold driven by the local-to-global amplification phenomenon. The algorithm runs in $O(n \log n)$ time and remains robust to monotone adversaries in the two-community setting, leveraging dense local structure and a connectivity-guaranteed visibility graph. The results advance understanding of when geometry enables efficient exact recovery and open directions for relaxing distinctness assumptions and exploring broader regimes and perturbations in geometric inference.

Abstract

In this paper, we propose a family of label recovery problems on weighted Euclidean random graphs. The vertices of a graph are embedded in $\mathbb{R}^d$ according to a Poisson point process, and are assigned to a discrete community label. Our goal is to infer the vertex labels, given edge weights whose distributions depend on the vertex labels as well as their geometric positions. Our general model provides a geometric extension of popular graph and matrix problems, including submatrix localization and $\mathbb{Z}_2$-synchronization, and includes the Geometric Stochastic Block Model (proposed by Sankararaman and Baccelli) as a special case. We study the fundamental limits of exact recovery of the vertex labels. Under a mild distinctness of distributions assumption, we determine the information-theoretic threshold for exact label recovery, in terms of a Chernoff-Hellinger divergence criterion. Impossibility of recovery below the threshold is proven by a unified analysis using a Cramér lower bound. Achievability above the threshold is proven via an efficient two-phase algorithm, where the first phase computes an almost-exact labeling through a local propagation scheme, while the second phase refines the labels. The information-theoretic threshold is dictated by the performance of the so-called genie estimator, which decodes the label of a single vertex given all the other labels. This shows that our proposed models exhibit the local-to-global amplification phenomenon.

Exact Label Recovery in Euclidean Random Graphs

TL;DR

, below which exact recovery is impossible and above which a linear-time, two-phase algorithm achieves exact recovery: Phase I yields an almost-exact labeling via local propagation on a block-structured torus, and Phase II refines this labeling to exact recovery using a genie-aided, maximum-likelihood approach. The framework subsumes notable models like the Geometric Stochastic Block Model (GSBM), geometric

synchronization, and geometric submatrix localization, with the threshold driven by the local-to-global amplification phenomenon. The algorithm runs in

time and remains robust to monotone adversaries in the two-community setting, leveraging dense local structure and a connectivity-guaranteed visibility graph. The results advance understanding of when geometry enables efficient exact recovery and open directions for relaxing distinctness assumptions and exploring broader regimes and perturbations in geometric inference.

Abstract

In this paper, we propose a family of label recovery problems on weighted Euclidean random graphs. The vertices of a graph are embedded in

according to a Poisson point process, and are assigned to a discrete community label. Our goal is to infer the vertex labels, given edge weights whose distributions depend on the vertex labels as well as their geometric positions. Our general model provides a geometric extension of popular graph and matrix problems, including submatrix localization and

-synchronization, and includes the Geometric Stochastic Block Model (proposed by Sankararaman and Baccelli) as a special case. We study the fundamental limits of exact recovery of the vertex labels. Under a mild distinctness of distributions assumption, we determine the information-theoretic threshold for exact label recovery, in terms of a Chernoff-Hellinger divergence criterion. Impossibility of recovery below the threshold is proven by a unified analysis using a Cramér lower bound. Achievability above the threshold is proven via an efficient two-phase algorithm, where the first phase computes an almost-exact labeling through a local propagation scheme, while the second phase refines the labels. The information-theoretic threshold is dictated by the performance of the so-called genie estimator, which decodes the label of a single vertex given all the other labels. This shows that our proposed models exhibit the local-to-global amplification phenomenon.

Paper Structure (38 sections, 50 theorems, 165 equations, 3 figures, 7 algorithms)

This paper contains 38 sections, 50 theorems, 165 equations, 3 figures, 7 algorithms.

Introduction
The GSBM
General Inference Problems on Geometric Graphs
Further Related Work
Notation and Organization
Models and Main Results
Geometric Hidden Community Model
Fundamental Limits for Exact Recovery
Special Cases of the GHCM
Robustness Under Monotone Adversaries
Exact Recovery Algorithm
Exact Recovery for
Exact Recovery for General
Proof Outline
Discussion and Future Work
...and 23 more sections

Key Result

Theorem 2.1

Any estimator fails to achieve exact recovery for $G\sim \text{GHCM}(\lambda, n, \pi, P, d)$ if

Figures (3)

Figure 1: Propagation schedule for $d=1$ and $d=2$.
Figure 2: Geometry around block $B_i$, showing a portion of the region $\mathcal{S}_{2,n}$. The set $U_i$ is comprised of dark and light blue blocks.
Figure 3: Possible isolated components in $\mathcal{S}_{2,n}$ for Proposition \ref{['lemma:connectivity']}.

Theorems & Definitions (92)

Definition 2.1: Geometric Hidden Community Model
Definition 2.2: Permissible relabeling
Theorem 2.1: Impossibility
Conjecture 2.2
Theorem 2.3: Achievability
Theorem 2.4
Theorem 2.5: Theorem 3.7 in Abbe2021
Corollary 2.6: Two-community symmetric GSBM
Corollary 2.7: General GSBM
Proposition 2.8: Geometric $\mathbb{Z}_2$ synchronization
...and 82 more

Exact Label Recovery in Euclidean Random Graphs

TL;DR

Abstract

Exact Label Recovery in Euclidean Random Graphs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (92)