Table of Contents
Fetching ...

Asymptotic Time-Uniform Inference for Parameters in Averaged Stochastic Approximation

Chuhan Xie, Kaicheng Jin, Jiadong Liang, Zhihua Zhang

TL;DR

This work analyzes the almost-sure convergence rates of the averaged iterates to a scaled sum of Gaussians in both linear and nonlinear SA problems, and constructs three types of asymptotic confidence sequences that are valid uniformly across all times with coverage guarantees.

Abstract

We study time-uniform statistical inference for parameters in stochastic approximation (SA), which encompasses a bunch of applications in optimization and machine learning. To that end, we analyze the almost-sure convergence rates of the averaged iterates to a scaled sum of Gaussians in both linear and nonlinear SA problems. We then construct three types of asymptotic confidence sequences that are valid uniformly across all times with coverage guarantees, in an asymptotic sense that the starting time is sufficiently large. These coverage guarantees remain valid if the unknown covariance matrix is replaced by its plug-in estimator, and we conduct experiments to validate our methodology.

Asymptotic Time-Uniform Inference for Parameters in Averaged Stochastic Approximation

TL;DR

This work analyzes the almost-sure convergence rates of the averaged iterates to a scaled sum of Gaussians in both linear and nonlinear SA problems, and constructs three types of asymptotic confidence sequences that are valid uniformly across all times with coverage guarantees.

Abstract

We study time-uniform statistical inference for parameters in stochastic approximation (SA), which encompasses a bunch of applications in optimization and machine learning. To that end, we analyze the almost-sure convergence rates of the averaged iterates to a scaled sum of Gaussians in both linear and nonlinear SA problems. We then construct three types of asymptotic confidence sequences that are valid uniformly across all times with coverage guarantees, in an asymptotic sense that the starting time is sufficiently large. These coverage guarantees remain valid if the unknown covariance matrix is replaced by its plug-in estimator, and we conduct experiments to validate our methodology.

Paper Structure

This paper contains 45 sections, 31 theorems, 106 equations, 4 figures, 1 table.

Key Result

Theorem 3.1

Let $\epsilon>0$ be an arbitrarily small constant. Under Assumptions ass: lyapunov-ass: noise, there exists a sequence of $d$-dimensional i.i.d. Gaussian random vectors $\{{\boldsymbol G}_t\}_{t\geq 1}$ with mean zero and covariance ${\boldsymbol H}^{-1}{\boldsymbol S} {\boldsymbol H}^{-1}$ on the s where

Figures (4)

  • Figure 3.1: Almost-sure approximation rates for all error terms with respect to the choice of $a$ in both linear (Left) and nonlinear (Right) SA settings. Respectively, ${\mathcal{S}}_1$ represents the initialization error, ${\mathcal{S}}_2$ represents the linearization error which is absent in the linear case, ${\mathcal{S}}_3$ represents the matrix-inverse approximation error, ${\mathcal{S}}_4$ represents the covariance calibration error, and ${\mathcal{S}}_5$ represents the error due to the almost-sure invariance principles.
  • Figure 4.1: The shapes of three confidence regions at time $t\in \{10, 20, 100, 500\}$ in a 2D example.
  • Figure 5.1: Simulation results for $d=1$. The first row corresponds to linear regression and the second row corresponds to logistic regression. The first column shows fixed-time coverage rates of three CSs as well as the traditional CI ("fixed"); the second/third column displays time-uniform coverage rates without/with the plug-in variance estimator; the fourth/fifth column displays CS boundaries.
  • Figure 5.2: Simulation results for $d=5$, plotted for the first component. Meanings of all plots are the same as in Figure \ref{['fig:d=1']}.

Theorems & Definitions (39)

  • Remark 2.1
  • Theorem 3.1: Gaussian approximation
  • Corollary 3.2: Linear SA
  • Corollary 3.3: Nonlinear SA
  • Proposition 4.1: Time-Uniform Coverage for Gaussians
  • Remark 4.1
  • Proposition 4.2
  • Proposition 4.3
  • Theorem 4.4: Asymptotic Time-Uniform Coverage
  • Lemma A.1: Approximation of the averaged iterate process
  • ...and 29 more