A Control-Theoretic Perspective on Optimal High-Order Optimization
Tianyi Lin, Michael. I. Jordan
TL;DR
This work presents a simple yet nontrivial Lyapunov function that allows us to establish the existence and uniqueness of a global solution under certain regularity conditions and analyze the convergence properties of trajectories.
Abstract
We provide a control-theoretic perspective on optimal tensor algorithms for minimizing a convex function in a finite-dimensional Euclidean space. Given a function $Φ: \mathbb{R}^d \rightarrow \mathbb{R}$ that is convex and twice continuously differentiable, we study a closed-loop control system that is governed by the operators $\nabla Φ$ and $\nabla^2 Φ$ together with a feedback control law $λ(\cdot)$ satisfying the algebraic equation $(λ(t))^p\|\nablaΦ(x(t))\|^{p-1} = θ$ for some $θ\in (0, 1)$. Our first contribution is to prove the existence and uniqueness of a local solution to this system via the Banach fixed-point theorem. We present a simple yet nontrivial Lyapunov function that allows us to establish the existence and uniqueness of a global solution under certain regularity conditions and analyze the convergence properties of trajectories. The rate of convergence is $O(1/t^{(3p+1)/2})$ in terms of objective function gap and $O(1/t^{3p})$ in terms of squared gradient norm. Our second contribution is to provide two algorithmic frameworks obtained from discretization of our continuous-time system, one of which generalizes the large-step A-HPE framework and the other of which leads to a new optimal $p$-th order tensor algorithm. While our discrete-time analysis can be seen as a simplification and generalization of~\citet{Monteiro-2013-Accelerated}, it is largely motivated by the aforementioned continuous-time analysis, demonstrating the fundamental role that the feedback control plays in optimal acceleration and the clear advantage that the continuous-time perspective brings to algorithmic design. A highlight of our analysis is that we show that all of the $p$-th order optimal tensor algorithms that we discuss minimize the squared gradient norm at a rate of $O(k^{-3p})$, which complements the recent analysis.
