Table of Contents
Fetching ...

Optimal Projection-Free Adaptive SGD for Matrix Optimization

Dmitry Kovalev

Abstract

Recently, Jiang et al. [2026] developed Leon, a practical variant of One-sided Shampoo [Xie et al., 2025a, An et al., 2025] algorithm for online convex optimization, which does not require computing a costly quadratic projection at each iteration. Unfortunately, according to the existing analysis, Leon requires tuning an additional hyperparameter in its preconditioner and cannot achieve dimension-independent convergence guarantees for convex optimization problems beyond the bounded gradients assumption. In this paper, we resolve this issue by proving certain stability properties of Leon's preconditioner. Using our improved analysis, we show that tuning the extra hyperparameter can be avoided and, more importantly, develop the first practical variant of One-sided Shampoo with Nesterov acceleration, which does not require computing projections at each iteration. As a side contribution, we obtain improved dimension-independent rates in the non-smooth non-convex setting and develop a unified analysis of the proposed algorithm, which yields accelerated projection-free adaptive SGD with (block-)diagonal preconditioners.

Optimal Projection-Free Adaptive SGD for Matrix Optimization

Abstract

Recently, Jiang et al. [2026] developed Leon, a practical variant of One-sided Shampoo [Xie et al., 2025a, An et al., 2025] algorithm for online convex optimization, which does not require computing a costly quadratic projection at each iteration. Unfortunately, according to the existing analysis, Leon requires tuning an additional hyperparameter in its preconditioner and cannot achieve dimension-independent convergence guarantees for convex optimization problems beyond the bounded gradients assumption. In this paper, we resolve this issue by proving certain stability properties of Leon's preconditioner. Using our improved analysis, we show that tuning the extra hyperparameter can be avoided and, more importantly, develop the first practical variant of One-sided Shampoo with Nesterov acceleration, which does not require computing projections at each iteration. As a side contribution, we obtain improved dimension-independent rates in the non-smooth non-convex setting and develop a unified analysis of the proposed algorithm, which yields accelerated projection-free adaptive SGD with (block-)diagonal preconditioners.

Paper Structure

This paper contains 21 sections, 12 theorems, 39 equations, 2 algorithms.

Key Result

lemma 1

<lem:Psi> The function $\Psi_k^*(m)$ is convex and differentiable, and its gradient is given as follows: $\blacktriangleleft$$\blacktriangleleft$

Theorems & Definitions (13)

  • lemma 1
  • lemma 2
  • theorem 1: Generalization of Theorem 3.2, jiang2026adaptive
  • lemma 3
  • lemma 4
  • theorem 2
  • theorem 3
  • definition 1
  • theorem 4
  • lemma 5
  • ...and 3 more