Table of Contents
Fetching ...

Detection of local geometry in random graphs: information-theoretic and computational limits

Jinho Bok, Shuangping Li, Sophie H. Yu

Abstract

We study the problem of detecting local geometry in random graphs. We introduce a model $\mathcal{G}(n, p, d, k)$, where a hidden community of average size $k$ has edges drawn as a random geometric graph on $\mathbb{S}^{d-1}$, while all remaining edges follow the Erdős--Rényi model $\mathcal{G}(n, p)$. The random geometric graph is generated by thresholding inner products of latent vectors on $\mathbb{S}^{d-1}$, with each edge having marginal probability equal to $p$. This implies that $\mathcal{G}(n, p, d, k)$ and $\mathcal{G}(n, p)$ are indistinguishable at the level of the marginals, and the signal lies entirely in the edge dependencies induced by the local geometry. We investigate both the information-theoretic and computational limits of detection. On the information-theoretic side, our upper bounds follow from three tests based on signed triangle counts: a global test, a scan test, and a constrained scan test; our lower bounds follow from two complementary methods: truncated second moment via Wishart--GOE comparison, and tensorization of KL divergence. These results together settle the detection threshold at $d = \widetildeΘ(k^2 \vee k^6/n^3)$ for fixed $p$, and extend the state-of-the-art bounds from the full model (i.e., $k = n$) for vanishing $p$. On the computational side, we identify a computational--statistical gap and provide evidence via the low-degree polynomial framework, as well as the suboptimality of signed cycle counts of length $\ell \geq 4$.

Detection of local geometry in random graphs: information-theoretic and computational limits

Abstract

We study the problem of detecting local geometry in random graphs. We introduce a model , where a hidden community of average size has edges drawn as a random geometric graph on , while all remaining edges follow the Erdős--Rényi model . The random geometric graph is generated by thresholding inner products of latent vectors on , with each edge having marginal probability equal to . This implies that and are indistinguishable at the level of the marginals, and the signal lies entirely in the edge dependencies induced by the local geometry. We investigate both the information-theoretic and computational limits of detection. On the information-theoretic side, our upper bounds follow from three tests based on signed triangle counts: a global test, a scan test, and a constrained scan test; our lower bounds follow from two complementary methods: truncated second moment via Wishart--GOE comparison, and tensorization of KL divergence. These results together settle the detection threshold at for fixed , and extend the state-of-the-art bounds from the full model (i.e., ) for vanishing . On the computational side, we identify a computational--statistical gap and provide evidence via the low-degree polynomial framework, as well as the suboptimality of signed cycle counts of length .

Paper Structure

This paper contains 41 sections, 27 theorems, 267 equations, 2 figures.

Key Result

Theorem 1.3

Let where $0 \leq \alpha < 1$, $\beta > 0$, and $0 < \gamma \leq 1$.

Figures (2)

  • Figure 1: Two drawings of the same graph sampled from ${\mathcal{G}}(n, p, d, k)$ with $n=20$, $p=0.18$, $k=7$; we set $d=2$ for visualization purposes. (a) Vertices are positioned to reflect the latent geometry: the $k=7$ community vertices (circled, teal) have latent vectors drawn from $\mathcal{U}(\mathbb{S}^1)$, so they lie on a circle in the latent space. The orange edges---based on geometric proximity---reveal the resulting cycle-rich structure induced by the local geometry. (b) Vertices are positioned randomly, and the planted community becomes visually indistinguishable from the Erdős--Rényi background.
  • Figure 2: Phase diagram for detection, for (a) $\alpha = 0$ and (b) $\alpha = 1/3$. Note that the two plots are in different scales. In the possible & easy phase (green), strong detection can be done by an efficient test statistic. In the possible & hard phase (yellow), strong detection can be done by an inefficient test statistic, and weak detection is impossible for low-degree polynomial algorithms. In the unknown & hard phase (grey), it is open whether strong detection is possible, but weak detection is impossible for low-degree polynomial algorithms. Finally, in the impossible phase (magenta), weak detection is impossible. The impossible phases extend to all values of $\beta > 0$ beyond those presented in the plots.

Theorems & Definitions (41)

  • Definition 1.1: Random graphs with local high-dimensional geometry
  • Definition 1.2
  • Theorem 1.3: Informal
  • Theorem 2.1: Detection via global test
  • Theorem 2.2: Detection via scan test
  • Theorem 2.3: Detection via constrained scan test
  • Remark 2.4: Comparison between tests
  • Theorem 2.5: Lower bound via truncated second moment
  • Theorem 2.6: Lower bound via tensorization
  • Remark 2.7: Comparison between lower bounds
  • ...and 31 more