A Framework for Statistical Inference via Randomized Algorithms

Zhixiang Zhang; Sokbae Lee; Edgar Dobriban

A Framework for Statistical Inference via Randomized Algorithms

Zhixiang Zhang, Sokbae Lee, Edgar Dobriban

TL;DR

This work develops a general statistical-inference framework for outputs of randomized algorithms, treating the data as deterministic and the randomness as arising from algorithmic procedures such as sketching and stochastic optimization. It introduces three main inference methods—sub-randomization, multi-run plug-in, and multi-run aggregation—along with an asymptotically pivotal baseline, and demonstrates how to apply them to least-squares problems under growing-dimension regimes and to stochastic optimization, including SGD with Polyak-Ruppert averaging and momentum methods. The results establish conditions under which valid confidence regions can be constructed with modest overhead, quantify bias and variance properties for different sketching schemes (i.i.d. and Haar), and provide extensive simulations showing competitive coverage and interval lengths. The framework offers practical guidelines for balancing accuracy and computation in large-scale data analysis, with extensions to iterative sketching and PCA, and broad applicability to stochastic approximation with dependent data. The work also includes a thorough cost analysis and discusses data-access considerations for scalable, parallel inference.

Abstract

Randomized algorithms, such as randomized sketching or stochastic optimization, are a promising approach to ease the computational burden in analyzing large datasets. However, randomized algorithms also produce non-deterministic outputs, leading to the problem of evaluating their accuracy. In this paper, we develop a statistical inference framework for quantifying the uncertainty of the outputs of randomized algorithms. Our key conclusion is that one can perform statistical inference for the target of a sequence of randomized algorithms as long as in the limit, their outputs fluctuate around the target according to any (possibly unknown) probability distribution. In this setting, we develop appropriate statistical inference methods -- sub-randomization, multi-run plug-in and multi-run aggregation -- by estimating the unknown parameters of the limiting distribution either using multiple runs of the randomized algorithm, or by tailored estimates. As illustrations, we develop methods for statistical inference when using stochastic optimization (such as Polyak-Ruppert averaging in stochastic gradient descent and stochastic optimization with momentum). We also illustrate our methods in inference for least squares parameters via randomized sketching, by characterizing the limiting distributions of sketching estimates in a possibly growing dimensional case. We further characterize the computation and communication cost of our methods, showing that in certain cases, they add negligible overhead. The results are supported via a broad range of simulations.

A Framework for Statistical Inference via Randomized Algorithms

TL;DR

Abstract

Paper Structure (68 sections, 31 theorems, 348 equations, 15 figures, 12 tables, 2 algorithms)

This paper contains 68 sections, 31 theorems, 348 equations, 15 figures, 12 tables, 2 algorithms.

Introduction
Related work
Contributions
General Framework
Asymptotically pivotal inference
Inference via sub-randomization
Sub-randomization inference under converging scale
Multi-run plug-in inference for a normal limit distribution
Inference by multi-run aggregation for nearly unbiased estimators
Examples
Sketch-and-solve least squares
Numerical simulations
Stochastic optimization and approximation
Our methods can be used for statistical inference via stochastic optimization
Polyak-Ruppert averaging for SGD
...and 53 more sections

Key Result

Proposition 2.1

Consider a sequence of problems as defined above. Suppose that as $m,n\to\infty$, for a known distribution $J$. For $\alpha\in (0,1)$, let $\Xi$ be a measurable set such that $J(\Xi)\geqslant 1-\alpha$. If $(\widehat{T}_{m,n})_{n\geqslant 1}$ is invertible with probability tending to unity and $\Xi$ is an open set, then Moreover, if $\Xi$ is a continuity set of $J$, then $\lim_{m,n\to\infty} P\l

Figures (15)

Figure 1: Flowchart illustrating our proposed framework. We consider some large data set $z_n$; which we cannot access directly due to its size. Instead we observe the output $Z_{m,n} = \mathcal{A}_m(z_n, S_{m,n})$ of a randomized algorithm, where $S_{m,n}$ is a source of randomness. We are interested in some parameter $\theta_n(z_n)$ of the unobserved data set; and aim to build a confidence region $C_m$ that contains this parameter with some pre-specified probability, so $P(\theta_n(z_n) \in C_m) \geqslant 1-\alpha$---at least asymptotically. We propose several approaches to reach this goal; some rely on generating additional smaller datasets $\{Z_{b,i} =\mathcal{A}_b(z_n, S_{b,i})\}_{i=1}^K$ by running the randomized algorithm repeatedly or in a distributed manner; and using them to construct the estimate $L_{b,m,n}$ from \ref{['L']} of the error distribution of the output of the randomized algorithm.
Figure 2: Methods for statistical inference via randomized algorithms, categorized by the conditions under which they are applicable. Here, $\widehat{J}_{m,n}$ is the distribution of of ${\widehat{T}_{m,n}(\widehat{\theta}_{m}-\theta_n)}$, where the randomness is only due to $S_{m,n}$. We consider two sets of conditions: Either that $\widehat{J}_{m,n}$ converges to a limiting distribution $J$, or that $\widehat{\theta}_{m}$ is nearly unbiased.
Figure 3: Left: Coverage of 90% intervals for the first coordinate of $\beta_n$, and 95% Clopper-Pearson interval for the coverage, in a synthetic data example. Right: Length of the confidence intervals. We use sketch-and-solve estimators obtained via i.i.d. sketching, and data generated from the model in Case 1, with $p=500, n=8,000,b=600, K=100$ and 500 Monte Carlo trials for each setting.
Figure 4: Inference using momentum and vanilla SGD algorithms. Methods compared include sub-randomization and multi-run plug-in inference with varying learning rates. The learning rates are $\gamma_t = 0.4/(t+1)^a$, and momentum parameters are $1- \gamma_t$.
Figure 5: Time for generating $K$ small sketches of size $b=200$ with $X\in \mathbb{R}^{2^{17}\times 100}$: "Block" refers to the memory-efficient computation of $K$ sketches using data blocking, and "full" represents the naive method requiring loading the full data $K$ times. Loading time indicates the time of loading the data, and total time encompasses both loading and sketch computation.
...and 10 more figures

Theorems & Definitions (46)

Proposition 2.1: Classical asymptotically pivotal inference
Theorem 2.2: Inference via sub-randomization
Corollary 2.3: Sub-randomization inference under converging scale
Theorem 2.4: Multi-run plug-in inference for a normal limit
Corollary 2.5: Multi-run plug-in inference with centering and scaling estimated using different output sizes
Theorem 2.6: Inference by multi-run aggregation
Theorem 3.2: Distributions of estimators obtained via sketching with i.i.d. entries
Remark 3.3
Corollary 3.4: Simplified distributions of i.i.d. sketching estimators
Proposition 3.5: Variance estimation for Gaussian sketching
...and 36 more

A Framework for Statistical Inference via Randomized Algorithms

TL;DR

Abstract

A Framework for Statistical Inference via Randomized Algorithms

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (46)