Stochastic Variance-Reduced Newton: Accelerating Finite-Sum Minimization with Large Batches

Michał Dereziński

Stochastic Variance-Reduced Newton: Accelerating Finite-Sum Minimization with Large Batches

Michał Dereziński

TL;DR

This work addresses accelerating finite-sum convex minimization by integrating variance reduction with stochastic Newton updates. The authors prove that Stochastic Variance-Reduced Newton (SVRN) achieves a faster sequential convergence rate, reducing data passes to $O\left(\frac{\log(1/\varepsilon)}{\log(n)}\right)$ while maintaining a low parallel cost through Hessian-based updates, with the improvement scaling favorably as the dataset size $n$ grows. They also introduce SVRN-HA, a globally convergent variant that combines Hessian averaging and a line-search phase with a subsequent SVRN phase, guaranteeing practical convergence from arbitrary initial points. Empirical results on logistic regression and least-squares problems demonstrate substantial speedups in data passes and competitive wall-clock performance, highlighting SVRN's practicality for large-scale second-order variance-reduced optimization.

Abstract

Stochastic variance reduction has proven effective at accelerating first-order algorithms for solving convex finite-sum optimization tasks such as empirical risk minimization. Incorporating second-order information has proven helpful in further improving the performance of these first-order methods. Yet, comparatively little is known about the benefits of using variance reduction to accelerate popular stochastic second-order methods such as Subsampled Newton. To address this, we propose Stochastic Variance-Reduced Newton (SVRN), a finite-sum minimization algorithm that provably accelerates existing stochastic Newton methods from $O(α\log(1/ε))$ to $O\big(\frac{\log(1/ε)}{\log(n)}\big)$ passes over the data, i.e., by a factor of $O(α\log(n))$, where $n$ is the number of sum components and $α$ is the approximation factor in the Hessian estimate. Surprisingly, this acceleration gets more significant the larger the data size $n$, which is a unique property of SVRN. Our algorithm retains the key advantages of Newton-type methods, such as easily parallelizable large-batch operations and a simple unit step size. We use SVRN to accelerate Subsampled Newton and Iterative Hessian Sketch algorithms, and show that it compares favorably to popular first-order methods with variance~reduction.

Stochastic Variance-Reduced Newton: Accelerating Finite-Sum Minimization with Large Batches

TL;DR

while maintaining a low parallel cost through Hessian-based updates, with the improvement scaling favorably as the dataset size

grows. They also introduce SVRN-HA, a globally convergent variant that combines Hessian averaging and a line-search phase with a subsequent SVRN phase, guaranteeing practical convergence from arbitrary initial points. Empirical results on logistic regression and least-squares problems demonstrate substantial speedups in data passes and competitive wall-clock performance, highlighting SVRN's practicality for large-scale second-order variance-reduced optimization.

Abstract

passes over the data, i.e., by a factor of

, where

is the number of sum components and

is the approximation factor in the Hessian estimate. Surprisingly, this acceleration gets more significant the larger the data size

, which is a unique property of SVRN. Our algorithm retains the key advantages of Newton-type methods, such as easily parallelizable large-batch operations and a simple unit step size. We use SVRN to accelerate Subsampled Newton and Iterative Hessian Sketch algorithms, and show that it compares favorably to popular first-order methods with variance~reduction.

Paper Structure (31 sections, 12 theorems, 56 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 31 sections, 12 theorems, 56 equations, 5 figures, 1 table, 1 algorithm.

Introduction
Main result
Discussion
Comparison to SVRG and Katyusha.
Comparison to Stochastic Newton.
Accelerating SVRN with sketching and importance sampling
Further related work
Local convergence analysis of SVRN
Notation.
Discussion.
Complexity analysis.
Globally convergent algorithm
Experiments
Logistic regression experiment
Further investigations on a least squares task
...and 16 more sections

Key Result

Theorem 1

Suppose that Assumption a:convex holds and: (a) $f$ has a Lipschitz Hessian, or (b) $f$ is self-concordant. Moreover, let $n\gg\kappa\gg\alpha$. There is an algorithm (SVRN) and an open neighborhood $U$ such that, given any $\mathbf x\in U$ with a corresponding Hessian $\alpha$-approximation as in e

Figures (5)

Figure 1: Illustration of the local convergence complexity analysis for SVRN, as a function of the mini-batch size $m$, with the number of inner iterations set to $t_{\max}=n/m$. As we decrease the mini-batch size from $n$ (standard Stochastic Newton; SN) downto $m\approx\frac{n}{\alpha\log(n/\kappa)}$ (optimal SVRN), the sequential complexity (number of passes over the data) improves by $O(\alpha\log(n))$, while the parallel complexity (number of batch gradient queries) remains optimal.
Figure 2: Convergence and runtime comparison of SVRN-HA against three baselines: classical Newton, SVRG (after parameter tuning), and Subsample Newton with Hessian Averaging (SN-HA), i.e., the initial phase of SVRN-HA ran without variance reduction all the way through.
Figure 3: How different types of gradient estimation affect the convergence properties of SVRN.
Figure 4: How Hessian sample size and data coherence affect the convergence properties of SVRN.
Figure 5: Convergence comparison of SVRN-HA against SN-HA and Newton for a synthetic logistic regression task as we vary the condition number of the data matrix, and for the CIFAR-10 dataset.

Theorems & Definitions (16)

Theorem 1: informal Theorem \ref{['t:svrn']}
Remark 1
Remark 2
Theorem 2: Fast least squares solver
Definition 1
Theorem 3: Convergence rate of SVRN
Remark 3
Theorem 4
Lemma 1
Lemma 2
...and 6 more

Stochastic Variance-Reduced Newton: Accelerating Finite-Sum Minimization with Large Batches

TL;DR

Abstract

Stochastic Variance-Reduced Newton: Accelerating Finite-Sum Minimization with Large Batches

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (16)