Predictive Subsampling for Scalable Inference in Networks

Arpan Kumar; Minh Tang; Srijan Sengupta

Predictive Subsampling for Scalable Inference in Networks

Arpan Kumar, Minh Tang, Srijan Sengupta

TL;DR

This work introduces a subsampling-based approach aimed at reducing the computational burden associated with estimation and two-sample hypothesis testing, and develops the methodology under the generalized random dot product graph framework, which affords broad applicability and permits rigorous analysis.

Abstract

Network datasets appear across a wide range of scientific fields, including biology, physics, and the social sciences. To enable data-driven discoveries from these networks, statistical inference techniques like estimation and hypothesis testing are crucial. However, the size of modern networks often exceeds the storage and computational capacities of existing methods, making timely, statistically rigorous inference difficult. In this work, we introduce a subsampling-based approach aimed at reducing the computational burden associated with estimation and two-sample hypothesis testing. Our strategy involves selecting a small random subset of nodes from the network, conducting inference on the resulting subgraph, and then using interpolation based on the observed connections between the subsample and the rest of the nodes to estimate the entire graph. We develop the methodology under the generalized random dot product graph framework, which affords broad applicability and permits rigorous analysis. Within this setting, we establish consistency guarantees and corroborate the practical effectiveness of the approach through comprehensive simulation studies.

Predictive Subsampling for Scalable Inference in Networks

TL;DR

Abstract

Paper Structure (24 sections, 9 theorems, 110 equations, 1 figure, 6 tables, 3 algorithms)

This paper contains 24 sections, 9 theorems, 110 equations, 1 figure, 6 tables, 3 algorithms.

Introduction
Notations
Method
Estimation with Subsampling
Hypothesis testing with Subsampling
An Alternative Testing Method using Single Subgraph
Theoretical Results
Setup
Consistency of Estimation
Consistency of Testing with Subsampling
Simulation Study
Estimation
Hypothesis testing
Real Data
Coauth-DBLP Dataset
...and 9 more sections

Key Result

Theorem 2

Let $A \thicksim \text{GRDPG}(X)$ as in Definition def1 and $A_S$ be the principal submatrix of $A$ corresponding to the subset $S$ of size $m$ chosen uniformly at random from $[n]$. Denote by $\hat{X}_S$ the adjacency spectral embedding of $A_S$ and $\hat{X}_{PS}$ the estimate of $X$ as obtained in We therefore have for the same orthogonal transformation $W$.

Figures (1)

Figure 1: PredSub Schematic

Theorems & Definitions (14)

Definition 1: Generalized Random Dot Product Graph
Theorem 2
Theorem 3
Corollary 4
Definition 5
Theorem 6
Theorem 7
Remark 8
Theorem 9
Remark 10
...and 4 more

Predictive Subsampling for Scalable Inference in Networks

TL;DR

Abstract

Predictive Subsampling for Scalable Inference in Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (14)