Table of Contents
Fetching ...

A Discrete Variational Derivation of Accelerated Methods in Optimization

Cédric M. Campos, Alejandro Mahillo, David Martín de Diego

TL;DR

The paper develops a geometric, variational framework for accelerated optimization by marrying discrete variational mechanics with time-dependent Lagrangians. It derives Polyak's Heavy Ball and Nesterov acceleration from discrete Hamiltonian and Lagrange-d'Alembert principles and establishes a one-to-one correspondence between the two via forcing terms. Through Bregman Lagrangians and cosymplectic geometry, the authors build explicit and implicit discretizations that preserve fibre-wise symplectic structure, and they validate the approach with simulations across multiple objective functions. The work offers a principled route to design acceleration schemes with geometric guarantees and highlights practical performance trade-offs among dilation choices, discretizations, and auxiliary optimization steps.

Abstract

Many of the new developments in machine learning are connected with gradient-based optimization methods. Recently, these methods have been studied using a variational perspective. This has opened up the possibility of introducing variational and symplectic methods using geometric integration. In particular, in this paper, we introduce variational integrators which allow us to derive different methods for optimization. Using both, Hamilton's and Lagrange-d'Alembert's principle, we derive two families of respective optimization methods in one-to-one correspondence that generalize Polyak's heavy ball and the well known Nesterov accelerated gradient method, the second of which mimics the behavior of the first reducing the oscillations of classical momentum methods. However, since the systems considered are explicitly time-dependent, the preservation of symplecticity of autonomous systems occurs here solely on the fibers. Several experiments exemplify the result.

A Discrete Variational Derivation of Accelerated Methods in Optimization

TL;DR

The paper develops a geometric, variational framework for accelerated optimization by marrying discrete variational mechanics with time-dependent Lagrangians. It derives Polyak's Heavy Ball and Nesterov acceleration from discrete Hamiltonian and Lagrange-d'Alembert principles and establishes a one-to-one correspondence between the two via forcing terms. Through Bregman Lagrangians and cosymplectic geometry, the authors build explicit and implicit discretizations that preserve fibre-wise symplectic structure, and they validate the approach with simulations across multiple objective functions. The work offers a principled route to design acceleration schemes with geometric guarantees and highlights practical performance trade-offs among dilation choices, discretizations, and auxiliary optimization steps.

Abstract

Many of the new developments in machine learning are connected with gradient-based optimization methods. Recently, these methods have been studied using a variational perspective. This has opened up the possibility of introducing variational and symplectic methods using geometric integration. In particular, in this paper, we introduce variational integrators which allow us to derive different methods for optimization. Using both, Hamilton's and Lagrange-d'Alembert's principle, we derive two families of respective optimization methods in one-to-one correspondence that generalize Polyak's heavy ball and the well known Nesterov accelerated gradient method, the second of which mimics the behavior of the first reducing the oscillations of classical momentum methods. However, since the systems considered are explicitly time-dependent, the preservation of symplecticity of autonomous systems occurs here solely on the fibers. Several experiments exemplify the result.

Paper Structure

This paper contains 24 sections, 4 theorems, 111 equations, 7 figures.

Key Result

Theorem 1

We have that $\Psi_{t,s}\colon \mathcal{U}_t\subseteq T^*Q\to T^*Q$ is a symplectomorphism, that is, $\Psi_{t,s}^*\omega_Q=\omega_Q$.

Figures (7)

  • Figure 1: Trajectory slices nearby the local minimum of the YATF using PHB/CM (pale) and NAG (strong) with the bounded coefficients from the Lagrangian's polynomial dilation with $n=3$. A nonlinear grayscale gradient indicates the minimum's location in black.
  • Figure 2: YATF residual values along PHB/CM (pale) and NAG (strong) trajectories for coefficients from the polynomially dilated Lagrangian with $n=3$ (blue) and $n=4$ (violet), and from the exponentially dilated Lagrangian (green).
  • Figure 3: Quadratic test function values along trajectories computed with NAG for constant (green), bounded (blue), and unbounded (violet) coefficients, and WWJ (red); the latter three set with $n=3$ (top) and $n=4$ (bottom).
  • Figure 4: Rosenbrock's test function values along trajectories computed with NAG for constant (green), bounded (blue), and unbounded (violet) coefficients, and WWJ (red); the latter three set with $n=3$ (top) and $n=4$ (bottom).
  • Figure 5: YATF residual values along trajectories computed with NAG for constant (green), bounded (blue), and unbounded (violet) coefficients, and WWJ (red); the latter three set with $n=3$ (top) and $n=4$ (bottom).
  • ...and 2 more figures

Theorems & Definitions (12)

  • Theorem 1
  • Proposition 2
  • Remark 3
  • Lemma 4
  • Remark 5
  • Theorem 6
  • Remark 7: One-to-one correspondence
  • Remark 8: Initial conditions
  • Remark 9: Natural trajectory
  • Remark 10: Discrete flow
  • ...and 2 more