Table of Contents
Fetching ...

Predictive Subsampling for Scalable Inference in Networks

Arpan Kumar, Minh Tang, Srijan Sengupta

TL;DR

This work introduces a subsampling-based approach aimed at reducing the computational burden associated with estimation and two-sample hypothesis testing, and develops the methodology under the generalized random dot product graph framework, which affords broad applicability and permits rigorous analysis.

Abstract

Network datasets appear across a wide range of scientific fields, including biology, physics, and the social sciences. To enable data-driven discoveries from these networks, statistical inference techniques like estimation and hypothesis testing are crucial. However, the size of modern networks often exceeds the storage and computational capacities of existing methods, making timely, statistically rigorous inference difficult. In this work, we introduce a subsampling-based approach aimed at reducing the computational burden associated with estimation and two-sample hypothesis testing. Our strategy involves selecting a small random subset of nodes from the network, conducting inference on the resulting subgraph, and then using interpolation based on the observed connections between the subsample and the rest of the nodes to estimate the entire graph. We develop the methodology under the generalized random dot product graph framework, which affords broad applicability and permits rigorous analysis. Within this setting, we establish consistency guarantees and corroborate the practical effectiveness of the approach through comprehensive simulation studies.

Predictive Subsampling for Scalable Inference in Networks

TL;DR

This work introduces a subsampling-based approach aimed at reducing the computational burden associated with estimation and two-sample hypothesis testing, and develops the methodology under the generalized random dot product graph framework, which affords broad applicability and permits rigorous analysis.

Abstract

Network datasets appear across a wide range of scientific fields, including biology, physics, and the social sciences. To enable data-driven discoveries from these networks, statistical inference techniques like estimation and hypothesis testing are crucial. However, the size of modern networks often exceeds the storage and computational capacities of existing methods, making timely, statistically rigorous inference difficult. In this work, we introduce a subsampling-based approach aimed at reducing the computational burden associated with estimation and two-sample hypothesis testing. Our strategy involves selecting a small random subset of nodes from the network, conducting inference on the resulting subgraph, and then using interpolation based on the observed connections between the subsample and the rest of the nodes to estimate the entire graph. We develop the methodology under the generalized random dot product graph framework, which affords broad applicability and permits rigorous analysis. Within this setting, we establish consistency guarantees and corroborate the practical effectiveness of the approach through comprehensive simulation studies.
Paper Structure (24 sections, 9 theorems, 110 equations, 1 figure, 6 tables, 3 algorithms)

This paper contains 24 sections, 9 theorems, 110 equations, 1 figure, 6 tables, 3 algorithms.

Key Result

Theorem 2

Let $A \thicksim \text{GRDPG}(X)$ as in Definition def1 and $A_S$ be the principal submatrix of $A$ corresponding to the subset $S$ of size $m$ chosen uniformly at random from $[n]$. Denote by $\hat{X}_S$ the adjacency spectral embedding of $A_S$ and $\hat{X}_{PS}$ the estimate of $X$ as obtained in We therefore have for the same orthogonal transformation $W$.

Figures (1)

  • Figure 1: PredSub Schematic

Theorems & Definitions (14)

  • Definition 1: Generalized Random Dot Product Graph
  • Theorem 2
  • Theorem 3
  • Corollary 4
  • Definition 5
  • Theorem 6
  • Theorem 7
  • Remark 8
  • Theorem 9
  • Remark 10
  • ...and 4 more