Table of Contents
Fetching ...

Optimal community detection in dense bipartite graphs

Julien Chhor, Parker Knight

TL;DR

This work addresses detecting a planted dense bipartite subgraph in a bipartite Erdős–Rényi graph, distinguishing $H_0: P_{ij}=p_0$ from $H_1$ where an unknown $k_1 imes k_2$ submatrix has elevated edge probability $p_0+oldsymbol{ extδ}$. It derives a non-asymptotic minimax separation rate $(oldsymbol{ extδ}^*)^2 ilde{=}oldsymbol{p}_0(1-oldsymbol{p}_0)R$, with $R$ defined via combinations of problem-geometry terms $oldsymbol{ extψ},oldsymbol{ extφ},oldsymbol{ extβ}$, and proves matching lower and upper bounds under a dense-graph Assumption. The proposed minimax-optimal detector $oldsymbol{ extΔ}^*$ combines a total-degree test with novel truncated-degree and max-truncated-degree tests based on truncated nonlinear statistics of the adjacency matrix, achieving the fundamental limit in dense regimes. The results reveal refined detection boundaries for imbalanced bipartite graphs and provide new tools (truncated χ^2-type statistics in a matrix Bernoulli setting) with potential interest beyond this specific problem. The work further discusses adaptivity, extensions to sparsity, and computational considerations, outlining clear directions for future research.

Abstract

We consider the problem of detecting a community of densely connected vertices in a high-dimensional bipartite graph of size $n_1 \times n_2$. Under the null hypothesis, the observed graph is drawn from a bipartite Erdős-Renyi distribution with connection probability $p_0$. Under the alternative hypothesis, there exists an unknown bipartite subgraph of size $k_1 \times k_2$ in which edges appear with probability $p_1 = p_0 + δ$ for some $δ> 0$, while all other edges outside the subgraph appear with probability $p_0$. Specifically, we provide non-asymptotic upper and lower bounds on the smallest signal strength $δ^*$ that is both necessary and sufficient to ensure the existence of a test with small enough type one and type two errors. We also derive novel minimax-optimal tests achieving these fundamental limits when the underlying graph is sufficiently dense. Our proposed tests involve a combination of hard-thresholded nonlinear statistics of the adjacency matrix, the analysis of which may be of independent interest. In contrast with previous work, our non-asymptotic upper and lower bounds match for any configuration of $n_1,n_2, k_1,k_2$.

Optimal community detection in dense bipartite graphs

TL;DR

This work addresses detecting a planted dense bipartite subgraph in a bipartite Erdős–Rényi graph, distinguishing from where an unknown submatrix has elevated edge probability . It derives a non-asymptotic minimax separation rate , with defined via combinations of problem-geometry terms , and proves matching lower and upper bounds under a dense-graph Assumption. The proposed minimax-optimal detector combines a total-degree test with novel truncated-degree and max-truncated-degree tests based on truncated nonlinear statistics of the adjacency matrix, achieving the fundamental limit in dense regimes. The results reveal refined detection boundaries for imbalanced bipartite graphs and provide new tools (truncated χ^2-type statistics in a matrix Bernoulli setting) with potential interest beyond this specific problem. The work further discusses adaptivity, extensions to sparsity, and computational considerations, outlining clear directions for future research.

Abstract

We consider the problem of detecting a community of densely connected vertices in a high-dimensional bipartite graph of size . Under the null hypothesis, the observed graph is drawn from a bipartite Erdős-Renyi distribution with connection probability . Under the alternative hypothesis, there exists an unknown bipartite subgraph of size in which edges appear with probability for some , while all other edges outside the subgraph appear with probability . Specifically, we provide non-asymptotic upper and lower bounds on the smallest signal strength that is both necessary and sufficient to ensure the existence of a test with small enough type one and type two errors. We also derive novel minimax-optimal tests achieving these fundamental limits when the underlying graph is sufficiently dense. Our proposed tests involve a combination of hard-thresholded nonlinear statistics of the adjacency matrix, the analysis of which may be of independent interest. In contrast with previous work, our non-asymptotic upper and lower bounds match for any configuration of .

Paper Structure

This paper contains 27 sections, 42 theorems, 280 equations.

Key Result

Theorem 1

Let $\eta \in [0,1]$ be given. There exist constants $c_\delta, \bar{C} > 0,$ such that if $\delta^2 \leq c_\delta p_0(1-p_0)R$ with (eq_def_phi) defined with $C = \bar{C}$, then it holds

Theorems & Definitions (77)

  • Theorem 1
  • Proposition 1
  • Proposition 2
  • Theorem 2
  • Proposition 3
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • ...and 67 more