Table of Contents
Fetching ...

Variance-Reduced Accelerated First-order Methods: Central Limit Theorems and Confidence Statements

Jinlong Lei, Uday V. Shanbhag

Abstract

In this paper, we study a stochastic strongly convex optimization problem and propose three classes of variable sample-size stochastic first-order methods including the standard stochastic gradient descent method, its accelerated variant, and the stochastic heavy ball method. In the iterates of each scheme, the unavailable exact gradients are approximated by averaging across an increasing batch size of sampled gradients. We prove that when the sample-size increases geometrically, the generated estimates converge in mean to the optimal solution at a geometric rate. Based on this result, we provide a unified framework to show that the rescaled estimation errors converge in distribution to a normal distribution, in which the covariance matrix depends on the Hessian matrix, covariance of the gradient noise, and the steplength. If the sample-size increases at a polynomial rate, we show that the estimation errors decay at the corresponding polynomial rate and establish the corresponding central limit theorems (CLTs). Finally, we provide an avenue to construct confidence regions for the optimal solution based on the established CLTs, and test the theoretic findings on a stochastic parameter estimation problem.

Variance-Reduced Accelerated First-order Methods: Central Limit Theorems and Confidence Statements

Abstract

In this paper, we study a stochastic strongly convex optimization problem and propose three classes of variable sample-size stochastic first-order methods including the standard stochastic gradient descent method, its accelerated variant, and the stochastic heavy ball method. In the iterates of each scheme, the unavailable exact gradients are approximated by averaging across an increasing batch size of sampled gradients. We prove that when the sample-size increases geometrically, the generated estimates converge in mean to the optimal solution at a geometric rate. Based on this result, we provide a unified framework to show that the rescaled estimation errors converge in distribution to a normal distribution, in which the covariance matrix depends on the Hessian matrix, covariance of the gradient noise, and the steplength. If the sample-size increases at a polynomial rate, we show that the estimation errors decay at the corresponding polynomial rate and establish the corresponding central limit theorems (CLTs). Finally, we provide an avenue to construct confidence regions for the optimal solution based on the established CLTs, and test the theoretic findings on a stochastic parameter estimation problem.

Paper Structure

This paper contains 30 sections, 30 theorems, 210 equations, 10 figures, 1 table, 3 algorithms.

Key Result

Lemma 1

Let Assumption ass-fun hold. Consider Algorithm Alg_1 with $\alpha \in$( $0, 2\over \eta+L$]. Then

Figures (10)

  • Figure 1: Linear Rate
  • Figure 2: Iteration Complexity
  • Figure 3: Oracle Complexity
  • Figure 4: Histograms of $\rho^{-k/2}(x_k^5-x^*)$ at $k=50$ along fitted normal distributions
  • Figure 5: Histograms of $\rho^{-k}(f(x_k)-f(x^*))$ at $k=50$
  • ...and 5 more figures

Theorems & Definitions (41)

  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Proposition 1: Rate and Oracle Complexity for Algorithm \ref{['Alg_1']}
  • Lemma 5
  • Proposition 2: Rate and Oracle Complexity for Algorithm \ref{['Alg_2']}
  • Proposition 3: Rate and Oracle Complexity for Algorithm \ref{['Alg_3']} on Quadratic Functions
  • Proposition 4: Rate for Algorithm \ref{['Alg_3']} on Non-Quadratic Functions
  • Remark 1
  • ...and 31 more