Table of Contents
Fetching ...

A stochastic first-order method with multi-extrapolated momentum for highly smooth unconstrained optimization

Chuan He

TL;DR

This work tackles unconstrained stochastic optimization when the objective $f$ has a $p$-th order Lipschitz derivative ($p\ge 2$). It introduces a stochastic first-order method with multi-extrapolated momentum that performs $p-1$ extrapolations per iteration and uses a time-varying, momentum-based gradient estimator designed to exploit higher-order smoothness. Under standard assumptions plus $D^p f$ being Lipschitz, the authors prove a sample complexity of $\widetilde{\mathcal{O}}(ε^{-(3p+1)/p})$ to obtain $\mathbb{E}[\|\nabla f(x)\|]\le ε$, improving upon bounds that rely on mean-squared smoothness. Numerical experiments on data fitting and robust regression problems corroborate the theoretical gains and demonstrate practical advantages of the multi-extrapolated momentum approach.

Abstract

In this paper, we consider an unconstrained stochastic optimization problem where the objective function exhibits high-order smoothness. Specifically, we propose a new stochastic first-order method (SFOM) with multi-extrapolated momentum, in which multiple extrapolations are performed in each iteration, followed by a momentum update based on these extrapolations. We demonstrate that the proposed SFOM can accelerate optimization by exploiting the high-order smoothness of the objective function $f$. Assuming that the $p$th-order derivative of $f$ is Lipschitz continuous for some $p\ge2$, and under additional mild assumptions, we establish that our method achieves a sample complexity of $\widetilde{\mathcal{O}}(ε^{-(3p+1)/p})$ for finding a point $x$ such that $\mathbb{E}[\|\nabla f(x)\|]\leε$. To the best of our knowledge, this is the first SFOM to leverage arbitrary-order smoothness of the objective function for acceleration, resulting in a sample complexity that improves upon the best-known results without assuming the mean-squared smoothness condition. Preliminary numerical experiments validate the practical performance of our method and support our theoretical findings.

A stochastic first-order method with multi-extrapolated momentum for highly smooth unconstrained optimization

TL;DR

This work tackles unconstrained stochastic optimization when the objective has a -th order Lipschitz derivative (). It introduces a stochastic first-order method with multi-extrapolated momentum that performs extrapolations per iteration and uses a time-varying, momentum-based gradient estimator designed to exploit higher-order smoothness. Under standard assumptions plus being Lipschitz, the authors prove a sample complexity of to obtain , improving upon bounds that rely on mean-squared smoothness. Numerical experiments on data fitting and robust regression problems corroborate the theoretical gains and demonstrate practical advantages of the multi-extrapolated momentum approach.

Abstract

In this paper, we consider an unconstrained stochastic optimization problem where the objective function exhibits high-order smoothness. Specifically, we propose a new stochastic first-order method (SFOM) with multi-extrapolated momentum, in which multiple extrapolations are performed in each iteration, followed by a momentum update based on these extrapolations. We demonstrate that the proposed SFOM can accelerate optimization by exploiting the high-order smoothness of the objective function . Assuming that the th-order derivative of is Lipschitz continuous for some , and under additional mild assumptions, we establish that our method achieves a sample complexity of for finding a point such that . To the best of our knowledge, this is the first SFOM to leverage arbitrary-order smoothness of the objective function for acceleration, resulting in a sample complexity that improves upon the best-known results without assuming the mean-squared smoothness condition. Preliminary numerical experiments validate the practical performance of our method and support our theoretical findings.

Paper Structure

This paper contains 14 sections, 15 theorems, 95 equations, 3 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

Under asp:pth-smth, the following inequality holds:

Figures (3)

  • Figure 1: Visualization of the updates for $\{z^{k,t}\}_{1\le t\le 3}$ (left) and $m^k$ (right) on a contour plot.
  • Figure 2: Convergence behavior of the relative objective value for all SFOMs in solving problem \ref{['df']}.
  • Figure 3: Convergence behavior of the relative loss for all SFOMs in solving problem \ref{['robust-reg']}.

Theorems & Definitions (33)

  • Remark 1
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Theorem 1
  • Theorem 2
  • Remark 2
  • Lemma 4
  • Lemma 5
  • Theorem 3
  • ...and 23 more