Table of Contents
Fetching ...

Sketch-and-Project Meets Newton Method: Global $\mathcal O(k^{-2})$ Convergence with Low-Rank Updates

Slavomír Hanzely

TL;DR

This work addresses scalable second-order optimization for convex, self-concordant objectives by introducing SGN, a Sketchy Global Newton method that operates in random low-rank subspaces. SGN unifies sketch-and-project, subspace Newton, and subspace regularized Newton updates, delivering a global ${O}(k^{-2})$ convergence rate while keeping per-iteration costs at ${O}(d\tau^2)$ (and ${O}(1)$ when $\tau=1$). It additionally provides a fast local linear convergence independent of conditioning and a global linear convergence regime under relative smoothness/convexity, all under affine-invariant geometric assumptions. Empirical results on LIBSVM logistic-loss problems corroborate the theory, showing SGN can match or approach the performance of state-of-the-art Newton-like methods with substantially cheaper updates, highlighting its practical impact for large-scale machine learning.

Abstract

In this paper, we propose the first sketch-and-project Newton method with fast $\mathcal O(k^{-2})$ global convergence rate for self-concordant functions. Our method, SGN, can be viewed in three ways: i) as a sketch-and-project algorithm projecting updates of Newton method, ii) as a cubically regularized Newton ethod in sketched subspaces, and iii) as a damped Newton method in sketched subspaces. SGN inherits best of all three worlds: cheap iteration costs of sketch-and-project methods, state-of-the-art $\mathcal O(k^{-2})$ global convergence rate of full-rank Newton-like methods and the algorithm simplicity of damped Newton methods. Finally, we demonstrate its comparable empirical performance to baseline algorithms.

Sketch-and-Project Meets Newton Method: Global $\mathcal O(k^{-2})$ Convergence with Low-Rank Updates

TL;DR

This work addresses scalable second-order optimization for convex, self-concordant objectives by introducing SGN, a Sketchy Global Newton method that operates in random low-rank subspaces. SGN unifies sketch-and-project, subspace Newton, and subspace regularized Newton updates, delivering a global convergence rate while keeping per-iteration costs at (and when ). It additionally provides a fast local linear convergence independent of conditioning and a global linear convergence regime under relative smoothness/convexity, all under affine-invariant geometric assumptions. Empirical results on LIBSVM logistic-loss problems corroborate the theory, showing SGN can match or approach the performance of state-of-the-art Newton-like methods with substantially cheaper updates, highlighting its practical impact for large-scale machine learning.

Abstract

In this paper, we propose the first sketch-and-project Newton method with fast global convergence rate for self-concordant functions. Our method, SGN, can be viewed in three ways: i) as a sketch-and-project algorithm projecting updates of Newton method, ii) as a cubically regularized Newton ethod in sketched subspaces, and iii) as a damped Newton method in sketched subspaces. SGN inherits best of all three worlds: cheap iteration costs of sketch-and-project methods, state-of-the-art global convergence rate of full-rank Newton-like methods and the algorithm simplicity of damped Newton methods. Finally, we demonstrate its comparable empirical performance to baseline algorithms.
Paper Structure (37 sections, 21 theorems, 86 equations, 3 figures, 4 tables, 5 algorithms)

This paper contains 37 sections, 21 theorems, 86 equations, 3 figures, 4 tables, 5 algorithms.

Key Result

Theorem 1

If $\nabla f(x^k) \in {\rm Range}\left( \nabla^2 f(x^k)\right)$${\rm Range}\left( \mathcal{A}\right)$ denotes column space of the matrix $\mathcal{A}$., then the update rules are equivalent: where $\mathbf P_{x^{k}}$ is a projection matrix onto ${\rm Range}\left( \mathbf S_k\right)$ with respect to norm ${\left \| \cdot \right\|}_{x_k}$ (defined in eq. eq:px), We call this algorithm Sketchy Glob

Figures (3)

  • Figure 1: Comparison of SSCN, SGN and CD on the logistic regression loss on LIBSVM datasets for sketch matrices $\mathbf S$ of rank one. We fine-tune all algorithms for their smoothness parameters.
  • Figure 2: Comparison of SSCN, SGN, CD and ACD on logistic regression on LIBSVM datasets for sketch matrices $\mathbf S$ of rank one. We fine-tune all algorithms for smoothness parameters.
  • Figure 3: Exact Newton Descent KSJ-Newton2018

Theorems & Definitions (38)

  • Theorem 1
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Definition 1
  • Definition 2
  • Lemma 4
  • Proposition 1: hanzely2022damped, Lemma 2
  • Lemma 5
  • Theorem 2
  • ...and 28 more