Table of Contents
Fetching ...

Continuous Newton-like Methods featuring Inertia and Variable Mass

Camille Castera, Hedy Attouch, Jalal Fadili, Peter Ochs

TL;DR

This work extends the class of inertial Newton-like dynamics by featuring a time-dependent parameter in front of the acceleration, called variable mass, and provides guarantees on how the Newtonian and inertial behaviors of the system can be non-asymptotically controlled by means of this variable mass.

Abstract

We introduce a new dynamical system, at the interface between second-order dynamics with inertia and Newton's method. This system extends the class of inertial Newton-like dynamics by featuring a time-dependent parameter in front of the acceleration, called variable mass. For strongly convex optimization, we provide guarantees on how the Newtonian and inertial behaviors of the system can be non-asymptotically controlled by means of this variable mass. A connection with the Levenberg--Marquardt (or regularized Newton's) method is also made. We then show the effect of the variable mass on the asymptotic rate of convergence of the dynamics, and in particular, how it can turn the latter into an accelerated Newton method. We provide numerical experiments supporting our findings. This work represents a significant step towards designing new algorithms that benefit from the best of both first- and second-order optimization methods.

Continuous Newton-like Methods featuring Inertia and Variable Mass

TL;DR

This work extends the class of inertial Newton-like dynamics by featuring a time-dependent parameter in front of the acceleration, called variable mass, and provides guarantees on how the Newtonian and inertial behaviors of the system can be non-asymptotically controlled by means of this variable mass.

Abstract

We introduce a new dynamical system, at the interface between second-order dynamics with inertia and Newton's method. This system extends the class of inertial Newton-like dynamics by featuring a time-dependent parameter in front of the acceleration, called variable mass. For strongly convex optimization, we provide guarantees on how the Newtonian and inertial behaviors of the system can be non-asymptotically controlled by means of this variable mass. A connection with the Levenberg--Marquardt (or regularized Newton's) method is also made. We then show the effect of the variable mass on the asymptotic rate of convergence of the dynamics, and in particular, how it can turn the latter into an accelerated Newton method. We provide numerical experiments supporting our findings. This work represents a significant step towards designing new algorithms that benefit from the best of both first- and second-order optimization methods.
Paper Structure (27 sections, 12 theorems, 76 equations, 6 figures, 1 table)

This paper contains 27 sections, 12 theorems, 76 equations, 6 figures, 1 table.

Key Result

Theorem 2.1

Under Assumption ass:general there exists a unique global solution $x:\mathbb{R}_+\to \mathbb{R}^n$ to eq::VMDINAVD.

Figures (6)

  • Figure 1: Left: phase diagram on distances from \ref{['eq::VMDINAVD']} to \ref{['eq::CN']} and \ref{['eq::LM']} (see Section \ref{['sec::Control']}). The color of each patch indicates which distance is considered, and the scaling of a corresponding upper-bound on this distance is written (in white for prior work, in black for our contributions). The green line separates the cases $\varepsilon\geq \alpha$ (above) and $\varepsilon\leq \alpha$ (below). Right: 2D illustration of the trajectories of \ref{['eq::VMDINAVD']} for several choices of $\varepsilon$ on a quadratic function. Fast-vanishing $\varepsilon(t)$ (dark-blue solid curves) bring solutions of \ref{['eq::VMDINAVD']} close to that of \ref{['eq::CN']}, making them, more robust to bad conditioning compared to first-order dynamics (e.g., gradient descent).
  • Figure 2: Comparison of the solutions $x_N$, $x_{LM}$ and $x$ of \ref{['eq::CN']}, \ref{['eq::LM']} and \ref{['eq::VMDINAVD']} respectively, for a strongly convex function of the form $f(x)=e^{-\Vert x\Vert^2} + \frac{1}{2} \Vert Ax\Vert^2$. Left figures: distance $\Vert x(t)-x_N(t)\Vert$ versus time $t$, each curve corresponds to a different choice of $\varepsilon$; middle figures: distance $\Vert x(t)-x_{LM}(t)\Vert$, again for several $\varepsilon$. Right figures: distance to the optimum $x^\star$ for reference, $x_N$ and $x_{LM}$ are in dotted and dashed lines, other curves correspond to \ref{['eq::VMDINAVD']} for several choices of $\varepsilon$. The brown curve is often hidden behind the purple (and sometimes the pink) curve. Top and bottom rows show results respectively for non-integrable and integrable viscous dampings $\alpha$. The theoretical bounds from Theorem \ref{['thm::mainGenResult']} are only displayed on Figure \ref{['fig::LP']} below, for the sake of readability.
  • Figure 3: Similar experiment and figures as those described in Figure \ref{['fig::expmt']}, but for the function $f(x)=\log\left(\sum_{i=1}^n e^{x_i} + e^{-x_i}\right) + \frac{1}{2} \Vert Ax\Vert^2$.
  • Figure 4: Similar experiment and figures as those described in Figure \ref{['fig::expmt']}, but for the function $f(x)=\sum_{i=1}^n x_i^{50} + \frac{1}{2} \Vert Ax\Vert^2$. The thin "dash dotted" curves represent approximations of the theoretical bounds from Theorem \ref{['thm::mainGenResult']} for each choice of $(\varepsilon,\alpha)$ considered.
  • Figure 5: Numerical validation of Theorem \ref{['thm::MainResAsymptotic']}: distance to the optimum $x^\star$ as a function of time on a quadratic function $f(x)=\frac{1}{2}\Vert Ax\Vert^2$. Left: speed comparison w.r.t. \ref{['eq::CN']} for several choices of $\varepsilon$ and $\alpha$. Right: Comparison with LM for $\alpha$ integrable or not and several choices of $\varepsilon$. Shades of blue represent cases where $\varepsilon(t)>\alpha(t)$ while shades of red represent the opposite setting.
  • ...and 1 more figures

Theorems & Definitions (30)

  • Theorem 2.1
  • Remark 2.2
  • Theorem 3.1
  • Remark 3.2
  • Remark 3.3
  • Lemma 3.4
  • proof
  • Lemma 3.5
  • proof : Proof of Theorem \ref{['thm::mainEpsLarge']}
  • Corollary 3.6
  • ...and 20 more