Table of Contents
Fetching ...

Adaptive Preconditioned Gradient Descent with Energy

Hailiang Liu, Levon Nurbekyan, Xuping Tian, Yunan Yang

TL;DR

This paper addresses constrained optimization by integrating an energy-based adaptive step size with preconditioned gradient descent. The authors introduce the Adaptive Energy Preconditioned Gradient (AEPG) method, which extends the Adaptive Energy Gradient Descent framework to preconditioned directions, enabling unconditional energy stability and provable convergence rates for general, PL, and convex objectives. They develop and analyze two concrete instances, Hessian-Riemannian gradient descent (HRGD) and natural gradient descent (NGD), and unify HRGD with NGD under a Hessian metric, including a projection mechanism for linear equality constraints. Numerical results demonstrate that AEPG accelerates convergence over classical preconditioned methods, particularly for ill-conditioned or nonconvex problems, and show strong performance in D-optimal design and Wasserstein-based tasks, indicating broad practical impact for constrained optimization in engineering and ML contexts.

Abstract

We propose an adaptive step size with an energy approach for a suitable class of preconditioned gradient descent methods. We focus on settings where the preconditioning is applied to address the constraints in optimization problems, such as the Hessian-Riemannian and natural gradient descent methods. More specifically, we incorporate these preconditioned gradient descent algorithms in the recently introduced Adaptive Energy Gradient Descent (AEGD) framework. In particular, we discuss theoretical results on the unconditional energy-stability and convergence rates across three classes of objective functions. Furthermore, our numerical results demonstrate excellent performance of the proposed method on several test bed optimization problems.

Adaptive Preconditioned Gradient Descent with Energy

TL;DR

This paper addresses constrained optimization by integrating an energy-based adaptive step size with preconditioned gradient descent. The authors introduce the Adaptive Energy Preconditioned Gradient (AEPG) method, which extends the Adaptive Energy Gradient Descent framework to preconditioned directions, enabling unconditional energy stability and provable convergence rates for general, PL, and convex objectives. They develop and analyze two concrete instances, Hessian-Riemannian gradient descent (HRGD) and natural gradient descent (NGD), and unify HRGD with NGD under a Hessian metric, including a projection mechanism for linear equality constraints. Numerical results demonstrate that AEPG accelerates convergence over classical preconditioned methods, particularly for ill-conditioned or nonconvex problems, and show strong performance in D-optimal design and Wasserstein-based tasks, indicating broad practical impact for constrained optimization in engineering and ML contexts.

Abstract

We propose an adaptive step size with an energy approach for a suitable class of preconditioned gradient descent methods. We focus on settings where the preconditioning is applied to address the constraints in optimization problems, such as the Hessian-Riemannian and natural gradient descent methods. More specifically, we incorporate these preconditioned gradient descent algorithms in the recently introduced Adaptive Energy Gradient Descent (AEGD) framework. In particular, we discuss theoretical results on the unconditional energy-stability and convergence rates across three classes of objective functions. Furthermore, our numerical results demonstrate excellent performance of the proposed method on several test bed optimization problems.
Paper Structure (29 sections, 12 theorems, 152 equations, 3 figures, 3 tables, 2 algorithms)

This paper contains 29 sections, 12 theorems, 152 equations, 3 figures, 3 tables, 2 algorithms.

Key Result

Theorem 2.1

AEPG aeng0 is unconditionally energy stable. Specifically, for any step size $\eta>0$, This implies that $r_k$ is strictly decreasing and converges to $r^*$ as $k\to \infty$. Furthremore,

Figures (3)

  • Figure 1: Contour plot and trajectories of AEPG and HRGD on two constrained optimization problems: the quadrative problem with $\alpha=10$ (a), and the Rosenbrock problem with $\alpha=100$ (b). In each plot, the red star represents the minimum point.
  • Figure 2: Comparison of computational time (in seconds) between AEPG and the FW/FW-away method for the D-optimal design problem. The datasets have varying dimensions of test vectors $m$ and a fixed number of test vectors $n=1000$
  • Figure 3: Gaussian mixture model: level sets, vector fields, and convergent paths using (a) methods with standard gradient and (b) methods with the Wasserstein natural gradient.

Theorems & Definitions (32)

  • Theorem 2.1: Unconditional energy stability
  • proof
  • Lemma 2.2
  • proof
  • Remark 2.3
  • Remark 2.4
  • Theorem 2.5
  • proof
  • Theorem 2.6
  • proof
  • ...and 22 more