Table of Contents
Fetching ...

Graph Quasirandomness for Hypothesis Testing of Stochastic Block Models

Kiril Bangachev, Guy Bresler

TL;DR

This work develops a quasirandomness-inspired framework for hypothesis testing between ${\mathbb G}(n,1/2)$ and stochastic block models by analyzing signed subgraph counts. It shows that, under several SBM regimes, approximate maximizers of the scaled Fourier coefficients $|\Phi(H)|^{1/|V(H)|}$ are achieved by a small set of simple graphs (edges, stars, 4-cycles, triangles), enabling constant-degree polynomial distinguishers based on these counts. A central contribution is the leaf-isolation technique and a nonnegative-model comparison that together bound general SBMs by these baseline testers, yielding testing guarantees in multiple SBM settings, including diagonal, nonnegative, and two-community models. The results connect Fourier-analytic SBM quantities to partition functions of associated spin systems, offering practical, near-linear to near-quadratic computable statistics with implications for graphon testing and low-degree hardness frameworks. The paper also outlines a rich set of examples, barrier discussions, and future directions toward sparse regimes, vertex-transitive testing, and broader computational-inference connections.

Abstract

The celebrated theorem of Chung, Graham, and Wilson on quasirandom graphs implies that if the 4-cycle and edge counts in a graph $G$ are both close to their typical number in $\mathbb{G}(n,1/2),$ then this also holds for the counts of subgraphs isomorphic to $H$ for any $H$ of constant size. We aim to prove a similar statement where the notion of close is whether the given (signed) subgraph count can be used as a test between $\mathbb{G}(n,1/2)$ and a stochastic block model $\mathbb{SBM}.$ Quantitatively, this is related to approximately maximizing $H \longrightarrow |Φ(H)|^{\frac{1}{|\mathsf{V}(H)|}},$ where $Φ(H)$ is the Fourier coefficient of $\mathbb{SBM}$, indexed by subgraph $H.$ This formulation turns out to be equivalent to approximately maximizing the partition function of a spin model over alphabet equal to the community labels in $\mathbb{SBM}.$ We resolve the approximate maximization when $\mathbb{SBM}$ satisfies one of four conditions: 1) the probability of an edge between any two vertices in different communities is exactly $1/2$; 2) the probability of an edge between two vertices from any two communities is at least $1/2$ (this case is also covered in a recent work of Yu, Zadik, and Zhang); 3) the probability of belonging to any given community is at least $c$ for some universal constant $c>0$; 4) $\mathbb{SBM}$ has two communities. In each of these cases, we show that there is an approximate maximizer of $|Φ(H)|^{\frac{1}{|\mathsf{V}(H)|}}$ in the set $\mathsf{A} = \{\text{stars, 4-cycle}\}.$ This implies that if there exists a constant-degree polynomial test distinguishing $\mathbb{G}(n,1/2)$ and $\mathbb{SBM},$ then the two distributions can also be distinguished via the signed count of some graph in $\mathsf{A}.$ We conjecture that the same holds true for distinguishing $\mathbb{G}(n,1/2)$ and any graphon if we also add triangles to $\mathsf{A}.$

Graph Quasirandomness for Hypothesis Testing of Stochastic Block Models

TL;DR

This work develops a quasirandomness-inspired framework for hypothesis testing between and stochastic block models by analyzing signed subgraph counts. It shows that, under several SBM regimes, approximate maximizers of the scaled Fourier coefficients are achieved by a small set of simple graphs (edges, stars, 4-cycles, triangles), enabling constant-degree polynomial distinguishers based on these counts. A central contribution is the leaf-isolation technique and a nonnegative-model comparison that together bound general SBMs by these baseline testers, yielding testing guarantees in multiple SBM settings, including diagonal, nonnegative, and two-community models. The results connect Fourier-analytic SBM quantities to partition functions of associated spin systems, offering practical, near-linear to near-quadratic computable statistics with implications for graphon testing and low-degree hardness frameworks. The paper also outlines a rich set of examples, barrier discussions, and future directions toward sparse regimes, vertex-transitive testing, and broader computational-inference connections.

Abstract

The celebrated theorem of Chung, Graham, and Wilson on quasirandom graphs implies that if the 4-cycle and edge counts in a graph are both close to their typical number in then this also holds for the counts of subgraphs isomorphic to for any of constant size. We aim to prove a similar statement where the notion of close is whether the given (signed) subgraph count can be used as a test between and a stochastic block model Quantitatively, this is related to approximately maximizing where is the Fourier coefficient of , indexed by subgraph This formulation turns out to be equivalent to approximately maximizing the partition function of a spin model over alphabet equal to the community labels in We resolve the approximate maximization when satisfies one of four conditions: 1) the probability of an edge between any two vertices in different communities is exactly ; 2) the probability of an edge between two vertices from any two communities is at least (this case is also covered in a recent work of Yu, Zadik, and Zhang); 3) the probability of belonging to any given community is at least for some universal constant ; 4) has two communities. In each of these cases, we show that there is an approximate maximizer of in the set This implies that if there exists a constant-degree polynomial test distinguishing and then the two distributions can also be distinguished via the signed count of some graph in We conjecture that the same holds true for distinguishing and any graphon if we also add triangles to

Paper Structure

This paper contains 81 sections, 37 theorems, 200 equations, 4 figures.

Key Result

Theorem 1.1

The following conditions are equivalent for an $n$-vertex graph $G:$

Figures (4)

  • Figure 1: Illustration of the leaf-isolation inequality over the graph $H$ with leaves $5,6.$ It effectively compares $|\Phi_{\mathbb{SBM}(p,Q)}(H)|$ to $|\Phi_{\mathbb{SBM}(p,Q)}(H'\sqcup \mathsf{Star}_2)| = |\Phi_{\mathbb{SBM}(p,Q)}(H')|\times |\Phi_{\mathbb{SBM}(p,Q)}(\mathsf{Star}_2)|.$ In this comparison, a new vertex $7$ is created, which resolves the issue that $|\mathsf{E}(H)|<|\mathsf{V}(H)|.$
  • Figure 2: Decomposition of $H$ into vertices and edges in the case of in-community cancellations when $H$ is not a tree. $|\mathsf{V}(H)| = a+b+t+s,$$|\mathsf{E}_H(K,K)|\ge t, |\mathsf{E}_H(K, \mathsf{V}(H)\backslash K)|\ge a+b -1,$$|\mathsf{E}_H(\mathsf{V}(H)\backslash K, \mathsf{V}(H)\backslash K)|\ge s$ and $|\mathsf{E}(H)|\ge |\mathsf{V}(H)| = a+ b + t + s.$ The blue circles represent the connected components in $H|_{K}$ and $H|_{\mathsf{V}(H)\backslash K},$ respectively.
  • Figure 3: Decomposition of $H$ into vertices and edges in the case of in-community cancellations when $H$ is a tree. $|\mathsf{V}(H)| = a+b+t+s+2,$$|\mathsf{E}_H(K,K)|\ge t, |\mathsf{E}_H(K, \mathsf{V}(H)\backslash K)|= a+b -1,$$|\mathsf{E}_H(\mathsf{V}(H)\backslash K, \mathsf{V}(H)\backslash K)|= s$ and $|\mathsf{E}(H)|= a+ b + t + s+1.$ The blue circles represent the connected components in $H|_{K}$ and $H|_{\mathsf{V}(H)\backslash K},$ respectively. We have drawn the two leaves to have parents with different labels for the purposes of illustration, but this might not be the case.
  • Figure 4: Decomposition of $H$ into vertices and edges in the case of between-community cancellations. $|\mathsf{V}(H)| = a+b+t+s + |\mathsf{L}_1(K)| + |\mathsf{L}_2(K)|,$$|\mathsf{E}_H(K,K)|\ge t, |\mathsf{E}_H(K, \mathsf{V}(H)\backslash K)|\ge a+b -1,$$|\mathsf{E}_H(\mathsf{V}(H)\backslash K,\mathsf{V}(H)\backslash K)|\ge s.$ The blue circles represent the connected components in $H|_{K}$ and $H|_{\mathsf{V}(H)\backslash K}.$

Theorems & Definitions (79)

  • Theorem 1.1: chung87
  • Theorem 1.2: yu2024counting
  • Conjecture 1
  • Definition 1
  • Proposition 1.3
  • proof
  • Theorem 1.4: Main Results on Testing and Partition Function Maximization
  • Theorem 1.5: One-to-one comparison of Fourier Coefficients
  • Definition 2: hopkins2017bayesianhopkins18 specialized to graphs
  • Theorem 2.1: Folklore
  • ...and 69 more