Table of Contents
Fetching ...

Convergence analysis of stochastic higher-order majorization-minimization algorithms

Daniela Lupu, Ion Necoara

TL;DR

This work presents a stochastic higher-order algorithmic framework for minimizing the average of a very large number of sufficiently smooth functions and derives convergence results for nonconvex and convex optimization problems when the higher-order approximation of the objective function yields an error that is p times differentiable and has Lipschitz continuous p derivative.

Abstract

Majorization-minimization schemes are a broad class of iterative methods targeting general optimization problems, including nonconvex, nonsmooth and stochastic. These algorithms minimize successively a sequence of upper bounds of the objective function so that along the iterations the objective value decreases. We present a stochastic higher-order algorithmic framework for minimizing the average of a very large number of sufficiently smooth functions. Our stochastic framework is based on the notion of stochastic higher-order upper bound approximations of the finite-sum objective function and minibatching. We derive convergence results for nonconvex and convex optimization problems when the higher-order approximation of the objective function yields an error that is p times differentiable and has Lipschitz continuous p derivative. More precisely, for general nonconvex problems we present asymptotic stationary point guarantees and under Kurdyka-Lojasiewicz property we derive local convergence rates ranging from sublinear to linear. For convex problems with uniformly convex objective function we derive local (super)linear convergence results for our algorithm. Compared to existing stochastic (first-order) methods, our algorithm adapts to the problem's curvature and allows using any batch size. Preliminary numerical tests support the effectiveness of our algorithmic framework.

Convergence analysis of stochastic higher-order majorization-minimization algorithms

TL;DR

This work presents a stochastic higher-order algorithmic framework for minimizing the average of a very large number of sufficiently smooth functions and derives convergence results for nonconvex and convex optimization problems when the higher-order approximation of the objective function yields an error that is p times differentiable and has Lipschitz continuous p derivative.

Abstract

Majorization-minimization schemes are a broad class of iterative methods targeting general optimization problems, including nonconvex, nonsmooth and stochastic. These algorithms minimize successively a sequence of upper bounds of the objective function so that along the iterations the objective value decreases. We present a stochastic higher-order algorithmic framework for minimizing the average of a very large number of sufficiently smooth functions. Our stochastic framework is based on the notion of stochastic higher-order upper bound approximations of the finite-sum objective function and minibatching. We derive convergence results for nonconvex and convex optimization problems when the higher-order approximation of the objective function yields an error that is p times differentiable and has Lipschitz continuous p derivative. More precisely, for general nonconvex problems we present asymptotic stationary point guarantees and under Kurdyka-Lojasiewicz property we derive local convergence rates ranging from sublinear to linear. For convex problems with uniformly convex objective function we derive local (super)linear convergence results for our algorithm. Compared to existing stochastic (first-order) methods, our algorithm adapts to the problem's curvature and allows using any batch size. Preliminary numerical tests support the effectiveness of our algorithmic framework.

Paper Structure

This paper contains 13 sections, 12 theorems, 48 equations, 3 figures, 1 table.

Key Result

Lemma 3.3

Nes:19 Let $f$ be convex function with the $p>2$ derivative Lipschitz continuous with constant $L_p^f$. Then, for $M_p \geq pL_p^f$ and any $x \in {E}$ the function: is convex in the first argument.

Figures (3)

  • Figure 1: Behavior of SHOM ($p=1,2,3$) and SAGA: left - a8a, right - madelon.
  • Figure 2: Behavior of SHOM for different values of minibatch size $\tau$ ranging from $50$ to $1.000$ on a8a dataset: $p=2$ (top) and $p=3$ (bottom).
  • Figure 3: Left: behavior of SHOM and FastICA on ICA problem. Right: sample band of Salinas dataset corresponding to FastICA and SHOM ($p=2$).

Theorems & Definitions (26)

  • Definition 2.1
  • Example 2.2
  • Example 2.3
  • Example 2.4
  • Definition 2.5
  • Example 2.6
  • Definition 3.1
  • Example 3.2
  • Lemma 3.3
  • Example 3.4
  • ...and 16 more