Table of Contents
Fetching ...

A Control-Theoretic Perspective on Optimal High-Order Optimization

Tianyi Lin, Michael. I. Jordan

TL;DR

This work presents a simple yet nontrivial Lyapunov function that allows us to establish the existence and uniqueness of a global solution under certain regularity conditions and analyze the convergence properties of trajectories.

Abstract

We provide a control-theoretic perspective on optimal tensor algorithms for minimizing a convex function in a finite-dimensional Euclidean space. Given a function $Φ: \mathbb{R}^d \rightarrow \mathbb{R}$ that is convex and twice continuously differentiable, we study a closed-loop control system that is governed by the operators $\nabla Φ$ and $\nabla^2 Φ$ together with a feedback control law $λ(\cdot)$ satisfying the algebraic equation $(λ(t))^p\|\nablaΦ(x(t))\|^{p-1} = θ$ for some $θ\in (0, 1)$. Our first contribution is to prove the existence and uniqueness of a local solution to this system via the Banach fixed-point theorem. We present a simple yet nontrivial Lyapunov function that allows us to establish the existence and uniqueness of a global solution under certain regularity conditions and analyze the convergence properties of trajectories. The rate of convergence is $O(1/t^{(3p+1)/2})$ in terms of objective function gap and $O(1/t^{3p})$ in terms of squared gradient norm. Our second contribution is to provide two algorithmic frameworks obtained from discretization of our continuous-time system, one of which generalizes the large-step A-HPE framework and the other of which leads to a new optimal $p$-th order tensor algorithm. While our discrete-time analysis can be seen as a simplification and generalization of~\citet{Monteiro-2013-Accelerated}, it is largely motivated by the aforementioned continuous-time analysis, demonstrating the fundamental role that the feedback control plays in optimal acceleration and the clear advantage that the continuous-time perspective brings to algorithmic design. A highlight of our analysis is that we show that all of the $p$-th order optimal tensor algorithms that we discuss minimize the squared gradient norm at a rate of $O(k^{-3p})$, which complements the recent analysis.

A Control-Theoretic Perspective on Optimal High-Order Optimization

TL;DR

This work presents a simple yet nontrivial Lyapunov function that allows us to establish the existence and uniqueness of a global solution under certain regularity conditions and analyze the convergence properties of trajectories.

Abstract

We provide a control-theoretic perspective on optimal tensor algorithms for minimizing a convex function in a finite-dimensional Euclidean space. Given a function that is convex and twice continuously differentiable, we study a closed-loop control system that is governed by the operators and together with a feedback control law satisfying the algebraic equation for some . Our first contribution is to prove the existence and uniqueness of a local solution to this system via the Banach fixed-point theorem. We present a simple yet nontrivial Lyapunov function that allows us to establish the existence and uniqueness of a global solution under certain regularity conditions and analyze the convergence properties of trajectories. The rate of convergence is in terms of objective function gap and in terms of squared gradient norm. Our second contribution is to provide two algorithmic frameworks obtained from discretization of our continuous-time system, one of which generalizes the large-step A-HPE framework and the other of which leads to a new optimal -th order tensor algorithm. While our discrete-time analysis can be seen as a simplification and generalization of~\citet{Monteiro-2013-Accelerated}, it is largely motivated by the aforementioned continuous-time analysis, demonstrating the fundamental role that the feedback control plays in optimal acceleration and the clear advantage that the continuous-time perspective brings to algorithmic design. A highlight of our analysis is that we show that all of the -th order optimal tensor algorithms that we discuss minimize the squared gradient norm at a rate of , which complements the recent analysis.

Paper Structure

This paper contains 39 sections, 22 theorems, 163 equations, 4 algorithms.

Key Result

Proposition 2.3

Fixing $x \in \mathbb{R}^d$ with $\nabla \Phi(x) \neq 0$, the mapping $\varphi(\cdot, x)$ satisfies

Theorems & Definitions (46)

  • Remark 2.1
  • Remark 2.2
  • Proposition 2.3
  • proof
  • Proposition 2.4
  • proof
  • Proposition 2.5
  • proof
  • Theorem 2.6
  • proof
  • ...and 36 more