Table of Contents
Fetching ...

A generalized hypothesis test for community structure in networks

Eric Yanchenko, Srijan Sengupta

TL;DR

The paper addresses whether observed networks exhibit meaningful community structure by introducing a model-agnostic measure, the Expected Edge Density Difference (E2D2) with parameter γ(·) and its maximum \\tilde{γ}(P), and an associated statistic \\tilde{T}(A). It develops two hypothesis-testing frameworks: a baseline-value test with an asymptotic cutoff and a baseline-model bootstrap test against nulls such as Erdős-Rényi and Chung-Lu, supported by theoretical bootstrap results. The E2D2-based approach connects to modularity while allowing flexible null models, enabling principled significance testing and richer interpretation; this is demonstrated via simulations and real data (DBLP and hospital networks). The work provides practical, extensible tools for network analysis and outlines future directions, including sequential testing and extensions to more complex network structures.

Abstract

Researchers theorize that many real-world networks exhibit community structure where within-community edges are more likely than between-community edges. While numerous methods exist to cluster nodes into different communities, less work has addressed this question: given some network, does it exhibit statistically meaningful community structure? We answer this question in a principled manner by framing it as a statistical hypothesis test in terms of a general and model-agnostic community structure parameter. Leveraging this parameter, we propose a simple and interpretable test statistic used to formulate two separate hypothesis testing frameworks. The first is an asymptotic test against a baseline value of the parameter while the second tests against a baseline model using bootstrap-based thresholds. We prove theoretical properties of these tests and demonstrate how the proposed method yields rich insights into real-world data sets.

A generalized hypothesis test for community structure in networks

TL;DR

The paper addresses whether observed networks exhibit meaningful community structure by introducing a model-agnostic measure, the Expected Edge Density Difference (E2D2) with parameter γ(·) and its maximum \\tilde{γ}(P), and an associated statistic \\tilde{T}(A). It develops two hypothesis-testing frameworks: a baseline-value test with an asymptotic cutoff and a baseline-model bootstrap test against nulls such as Erdős-Rényi and Chung-Lu, supported by theoretical bootstrap results. The E2D2-based approach connects to modularity while allowing flexible null models, enabling principled significance testing and richer interpretation; this is demonstrated via simulations and real data (DBLP and hospital networks). The work provides practical, extensible tools for network analysis and outlines future directions, including sequential testing and extensions to more complex network structures.

Abstract

Researchers theorize that many real-world networks exhibit community structure where within-community edges are more likely than between-community edges. While numerous methods exist to cluster nodes into different communities, less work has addressed this question: given some network, does it exhibit statistically meaningful community structure? We answer this question in a principled manner by framing it as a statistical hypothesis test in terms of a general and model-agnostic community structure parameter. Leveraging this parameter, we propose a simple and interpretable test statistic used to formulate two separate hypothesis testing frameworks. The first is an asymptotic test against a baseline value of the parameter while the second tests against a baseline model using bootstrap-based thresholds. We prove theoretical properties of these tests and demonstrate how the proposed method yields rich insights into real-world data sets.

Paper Structure

This paper contains 18 sections, 4 theorems, 68 equations, 2 figures, 1 table, 2 algorithms.

Key Result

Theorem 2.1

Let $A\sim P$ and consider testing $H_0:\tilde{\gamma}(P)\leq\gamma_0$ as in (eq:h0math). Let A1 and A2 be true and consider the cutoff where $k_n=\{(\log K_n)/n\}^{1/2}$ and arbitrarily small $\epsilon>0$ chosen by the user. Then when the null hypothesis is true $(\tilde{\gamma}(P)\leq\gamma_0)$, the type-I error goes to 0, i.e., for any $\eta>0$, If the alternative hypothesis is true $(\tilde{

Figures (2)

  • Figure 1: Rejection rates from simulation study. See Section \ref{['sec4']} for complete details. (a) baseline-value null with fixed $\tilde{\gamma}(P)$; (b) baseline-value null with fixed $n$; (c) Erdős-Rényi null with fixed $\tilde{\gamma}(P)$; (d) Erdős-Rényi null with fixed $n$; (e) Chung-Lu null with fixed $\tilde{\gamma}(P)$; (f) Chung-Lu null with fixed $n$
  • Figure 2: Histograms of bootstrap samples from the proposed method for the two real data sets. The orange histogram is with the Erdős-Rényi null, and the blue histogram is with the Chung-Lu null. The vertical line (black) indicates the value of the test statistic.

Theorems & Definitions (4)

  • Theorem 2.1
  • Lemma 3.1
  • Lemma 3.2
  • Lemma 3.3