Information-Theoretic Thresholds for Bipartite Latent-Space Graphs Under Noisy Observations

Andreas Göbel; Marcus Pappik; Leon Schiller

Information-Theoretic Thresholds for Bipartite Latent-Space Graphs Under Noisy Observations

Andreas Göbel, Marcus Pappik, Leon Schiller

TL;DR

The paper establishes tight information-theoretic thresholds for detecting latent geometry in bipartite Gaussian random geometric graphs under a binary masking process. It introduces a novel Fourier-analytic framework that bounds signed subgraph counts by exploiting cancellations in the characteristic-function expansions, enabling control over large subgraphs and leading to precise phase diagrams that depend on the latent dimension $d$ and mask density $q$. A conditional second-moment method is developed to derive hardness results and to bridge known-vs-unknown mask settings, showing that knowing the mask lowers the effective sparsity threshold (roughly replacing $q$ with $q^2$ in the analysis). The results imply that there is no computational-statistical gap in the considered regimes and yield efficient tests based on wedges and 4-cycles, with extensions suggested to non-bipartite and sparser settings. Overall, the work advances understanding of latent geometry detectability in noisy, high-dimensional graph models and provides tools potentially applicable to related detection problems.

Abstract

We study information-theoretic phase transitions for the detectability of latent geometry in bipartite random geometric graphs RGGs with Gaussian d-dimensional latent vectors while only a subset of edges carries latent information determined by a random mask with i.i.d. Bern(q) entries. For any fixed edge density p in (0,1) we determine essentially tight thresholds for this problem as a function of d and q. Our results show that the detection problem is substantially easier if the mask is known upfront compared to the case where the mask is hidden. Our analysis is built upon a novel Fourier-analytic framework for bounding signed subgraph counts in Gaussian random geometric graphs that exploits cancellations which arise after approximating characteristic functions by an appropriate power series. The resulting bounds are applicable to much larger subgraphs than considered in previous work which enables tight information-theoretic bounds, while the bounds considered in previous works only lead to lower bounds from the lens of low-degree polynomials. As a consequence we identify the optimal information-theoretic thresholds and rule out computational-statistical gaps. Our bounds further improve upon the bounds on Fourier coefficients of random geometric graphs recently given by Bangachev and Bresler [STOC'24] in the dense, bipartite case. The techniques also extend to sparser and non-bipartite settings, at least if the considered subgraphs are sufficiently small. We furhter believe that they might help resolve open questions for related detection problems.

Information-Theoretic Thresholds for Bipartite Latent-Space Graphs Under Noisy Observations

TL;DR

and mask density

. A conditional second-moment method is developed to derive hardness results and to bridge known-vs-unknown mask settings, showing that knowing the mask lowers the effective sparsity threshold (roughly replacing

with

in the analysis). The results imply that there is no computational-statistical gap in the considered regimes and yield efficient tests based on wedges and 4-cycles, with extensions suggested to non-bipartite and sparser settings. Overall, the work advances understanding of latent geometry detectability in noisy, high-dimensional graph models and provides tools potentially applicable to related detection problems.

Abstract

Paper Structure (26 sections, 31 theorems, 156 equations, 2 figures)

This paper contains 26 sections, 31 theorems, 156 equations, 2 figures.

Introduction
The model and associated testing problems
Results
Technical Contributions
Second moment method
Challenges arising from previous work
Bounding total variation in terms of signed subgraph counts
Bounding expected signed weights
Information-theoretic bounds via a conditional second moment method
Bounding signed weights after cancellations in Fourier space
Outlook
Preliminaries
Information-Theoretic Hardness
The case of arbitrary $p$
Identifying the good event $S$ and deriving an expression for $\mathbb{E}\left[ \textsc{Sw}(K_{1,\alpha}) \right]$
...and 11 more sections

Key Result

Theorem 1.5

Consider any fixed $p \in (0,1)$. Then, the following holds whenever $d \gg \log(n)^3$. If $p \neq \frac{1}{2}$ then: $$ If $p = \frac{1}{2}$ then: $$ Here, $o(1)$ denotes a function that tends to zero as $n \rightarrow \infty$. Recall further that $m \ge n$.

Figures (2)

Figure 1: Illustration of the matrices sampled from $\mathbb{W}(n, m, q, p, d)$ for different $d$ and $q$. The rows and columns are ordered by the first coordinate of the latent vectors. Note that in our problem, an algorithm would not have access to this information, instead the rows and columns would be given in a random permutation of the matrices shown above. We sorted the rows and columns only for the sake of visualization.
Figure 2: Phase diagrams for $q = n^{-\beta}$, $d = n^{\alpha}$, and $n = m$. The colors represent the following regimes: In the green regime (falling pattern), $\mathbb{W}(n, m, q, p, d)$ and $\mathbb{M}(n, m, p)$ are distinguishable, even if the mask is not given. Efficient tests are given by counting signed wedges and signed 4-cycles.In the yellow regime (rising and falling pattern), the two models are indistinguishable if the mask is unknown but distinguishable if it is known. Efficient tests are again given by counting signed wedges and signed 4-cycles, but restricted to the mask.In the red regime (rising pattern), the models are indistinguishable even if the mask is given.For $p \neq \frac{1}{2}$, counting signed wedges supersedes counting signed $4$-cycles as an optimal test statistic for $\alpha \le 1$ (vertical dashed line). In contrast, for $p=\frac{1}{2}$, counting signed wedges has no statistical power, and the testable regime is purely determined by the statistical power of signed $4$-cycles.

Theorems & Definitions (60)

Definition 1.1: The distributions $\mathbb{W}(n, m, p, d)$ and $\mathbb{M}(n, m, p)$
Definition 1.2: The distributions $\mathbb{W}(n, m, q, p, d)$ and $\mathbb{W}_{\mathbf{M}}(n, m, p, d)$
Theorem 1.5: Information-theoretic thresholds for unknown masks
Theorem 1.6: Information-theoretic hardness for known masks
Definition 2.1: Signed weights
Proposition 2.1: Bound on conditional signed weights
Corollary 2.2: Bound on unconditional signed weights
proof
Lemma 4.1: Bound on Hermite polynomials, Inequality (1.2) in van1990new
Lemma 4.2: Strirling's approximation for the Gamma function
...and 50 more

Information-Theoretic Thresholds for Bipartite Latent-Space Graphs Under Noisy Observations

TL;DR

Abstract

Information-Theoretic Thresholds for Bipartite Latent-Space Graphs Under Noisy Observations

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (60)