A generalized hypothesis test for community structure in networks
Eric Yanchenko, Srijan Sengupta
TL;DR
The paper addresses whether observed networks exhibit meaningful community structure by introducing a model-agnostic measure, the Expected Edge Density Difference (E2D2) with parameter γ(·) and its maximum \\tilde{γ}(P), and an associated statistic \\tilde{T}(A). It develops two hypothesis-testing frameworks: a baseline-value test with an asymptotic cutoff and a baseline-model bootstrap test against nulls such as Erdős-Rényi and Chung-Lu, supported by theoretical bootstrap results. The E2D2-based approach connects to modularity while allowing flexible null models, enabling principled significance testing and richer interpretation; this is demonstrated via simulations and real data (DBLP and hospital networks). The work provides practical, extensible tools for network analysis and outlines future directions, including sequential testing and extensions to more complex network structures.
Abstract
Researchers theorize that many real-world networks exhibit community structure where within-community edges are more likely than between-community edges. While numerous methods exist to cluster nodes into different communities, less work has addressed this question: given some network, does it exhibit statistically meaningful community structure? We answer this question in a principled manner by framing it as a statistical hypothesis test in terms of a general and model-agnostic community structure parameter. Leveraging this parameter, we propose a simple and interpretable test statistic used to formulate two separate hypothesis testing frameworks. The first is an asymptotic test against a baseline value of the parameter while the second tests against a baseline model using bootstrap-based thresholds. We prove theoretical properties of these tests and demonstrate how the proposed method yields rich insights into real-world data sets.
