Higher-order accurate two-sample network inference and network hashing

Meijia Shao; Dong Xia; Yuan Zhang; Qiong Wu; Shuo Chen

Higher-order accurate two-sample network inference and network hashing

Meijia Shao, Dong Xia, Yuan Zhang, Qiong Wu, Shuo Chen

TL;DR

This paper develops a comprehensive toolbox, featuring a novel main method and its variants, all accompanied by strong theoretical guarantees, to address two-sample hypothesis testing for network comparison, and it is proved power-optimal.

Abstract

Two-sample hypothesis testing for network comparison presents many significant challenges, including: leveraging repeated network observations and known node registration, but without requiring them to operate; relaxing strong structural assumptions; achieving finite-sample higher-order accuracy; handling different network sizes and sparsity levels; fast computation and memory parsimony; controlling false discovery rate (FDR) in multiple testing; and theoretical understandings, particularly regarding finite-sample accuracy and minimax optimality. In this paper, we develop a comprehensive toolbox, featuring a novel main method and its variants, all accompanied by strong theoretical guarantees, to address these challenges. Our method outperforms existing tools in speed and accuracy, and it is proved power-optimal. Our algorithms are user-friendly and versatile in handling various data structures (single or repeated network observations; known or unknown node registration). We also develop an innovative framework for offline hashing and fast querying as a very useful tool for large network databases. We showcase the effectiveness of our method through comprehensive simulations and applications to two real-world datasets, which revealed intriguing new structures.

Higher-order accurate two-sample network inference and network hashing

TL;DR

Abstract

Paper Structure (33 sections, 8 theorems, 46 equations, 12 figures, 9 tables, 4 algorithms)

This paper contains 33 sections, 8 theorems, 46 equations, 12 figures, 9 tables, 4 algorithms.

Introduction
Graphon model and network moments
Two-sample graphon model
Network method of moments
Higher-order accurate method by Edgeworth expansion
Variance estimation and studentization
Characterizing the distribution via Edgeworth expansion
Two-sample test and Cornish-Fisher confidence interval
Network hashing and fast querying
Pooling over multiple networks in the same group
Common node set
Independently selected node sets
Computation acceleration and adapting to degeneracy
Computation acceleration by U-statistic reduction
Handling indeterminate degeneracy
...and 18 more sections

Key Result

Theorem 1

Assume: Define the population Edgeworth expansion $G_{m,n}(u)$ for $\widehat{T}_{m,n}+\delta_T$ as in (eq:Gmn). Let $\widehat{G}_{m,n}$ be its empirical version defined above. Then we have

Figures (12)

Figure 1: Comparison of type I error control (Row 1) and power difference (Row 2: $\varpi=0.05$ and Row 3: $\varpi=0.20$). Blue in Row 1 and green in Rows 2 and 3 indicate performance advantage of our method; red and brown indicate disadvantageous comparisons.
Figure 2: Database offline hashing and querying. Row 1: comparison of methods on query accuracy and time cost. In row 2, we kept the $X$-axis range consistent, but this cuts out some cyan bars on the far left. For plots with complete $X$-axes, see Section \ref{['supple::different graphon']} in Supplementary Material.
Figure 3: Control of FDR (dashed curves) and test power (solid curves) under different $\mathfrak{q}$ ($H_0$ proportion) and gaps between hypotheses ($\varpi$, marked as "shift" in the plots). Row 1: model 1 (keyword, $m$ nodes) vs. model 2 ($n$ nodes); row 2: model 1 vs model 3; row 3: model 3 vs model 4. Columns 1--4 are increasing network sizes $m=n\in\{40,80,160,320\}$.
Figure 4: Scenario 1: common node set. Row 1: $m=n=20$; row 2: $m=n=40$.
Figure 5: Scenario 2: independent node sets. Row 1: $m=n=20$; row 2: $m=n=40$.
...and 7 more figures

Theorems & Definitions (9)

Theorem 1: Population and empirical Edgeworth expansions
Theorem 2
Theorem 3
Remark 1
Theorem 4
Theorem 5
Theorem 6
Theorem 7: Asymptotic normality with automatic adaptation to indeterminate degeneracy
Theorem 8

Higher-order accurate two-sample network inference and network hashing

TL;DR

Abstract

Higher-order accurate two-sample network inference and network hashing

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (9)