Optimization Methods Rooting in Optimal Control
Huanshui Zhang, Hongxia Wang
TL;DR
The paper addresses limitations of classical optimization methods, particularly sensitivity to hyperparameters and Hessian degeneracy, by introducing an optimal-control perspective on OPs. It develops an OCP-based update rule, $x_{k+1}=x_k - R^{-1} \sum_{i=k+1}^{N+1} f'(x_i)$, derived from $\min_{u} \sum_{k=0}^N [ f(x_k) + u_k' R u_k ] + f(x_{N+1})$ subject to $x_{k+1}=x_k+u_k$, and analyzes its convergence. A key result is a linear convergence rate $\left(\frac{R}{R+f^{(2)}(x_*)}\right)^{N+1}$ under $f'(x_*)=0$, $f^{(2)}(x_*)>0$, with the potential to approach near-quadratic convergence by selecting small $R$ and large $N$; Newton's method is recovered at $R=0$, and the method remains robust when $f''(x)$ is ill-conditioned or zero due to the positive-definite $R$. The approach offers a tunable, Hessian-robust optimization algorithm with a clear physical interpretation from optimal control.
Abstract
In the paper, we propose solving optimization problems (OPs) and understanding the Newton method from the optimal control view. We propose a new optimization algorithm based on the optimal control problem (OCP). The algorithm features converging more rapidly than gradient descent, meanwhile, it is superior to Newton's method because it is not divergent in general and can be applied in the case of a singular Hessian matrix. These merits are supported by the convergence analysis for the algorithm in the paper. We also point out that the convergence rate of the proposed algorithm is inversely proportional to the magnitude of the control weight matrix and proportional to the control terminal time inherited from OCP.
