Advancing Multi-Secant Quasi-Newton Methods for General Convex Functions

Mokhwa Lee; Yifan Sun

Advancing Multi-Secant Quasi-Newton Methods for General Convex Functions

Mokhwa Lee, Yifan Sun

TL;DR

The paper tackles stability challenges in multisecant quasi-Newton methods for general convex functions by introducing a cheap diagonal PSD perturbation combined with symmetrization. It proves a local $q$-superlinear convergence result under standard smoothness and convexity assumptions while decaying the perturbation to zero, and demonstrates the method's practical competitiveness through extensive numerical experiments. The authors also extend the approach to limited-memory settings (L-BFGS) and explore nonconvex neural network training, showing improved convergence in ill-conditioned scenarios but highlighting stability concerns requiring adaptive techniques. Overall, multisecant QN with PSD perturbation offers a meaningful advance over single-secant updates, providing faster convergence in challenging landscapes and a viable path toward scalable, higher-order optimization in machine learning and scientific computing.

Abstract

Quasi-Newton (QN) methods provide an efficient alternative to second-order methods for minimizing smooth unconstrained problems. While QN methods generally compose a Hessian estimate based on one secant interpolation per iteration, multisecant methods use multiple secant interpolations and can improve the quality of the Hessian estimate at small additional overhead cost. However, implementing multisecant QN methods has several key challenges involving method stability, the most critical of which is that when the objective function is convex but not quadratic, the Hessian approximate is not, in general, symmetric positive semidefinite (PSD), and the steps are not guaranteed to be descent directions. We therefore investigate a symmetrized and PSD-perturbed Hessian approximation method for multisecant QN. We offer an efficiently computable method for producing the PSD perturbation, show superlinear convergence of the new method, and demonstrate improved numerical experiments over general convex minimization problems. We also investigate the limited memory extension of the method, focusing on BFGS, on both convex and non-convex functions. Our results suggest that in ill-conditioned optimization landscapes, leveraging multiple secants can accelerate convergence and yield higher-quality solutions compared to traditional single-secant methods.

Advancing Multi-Secant Quasi-Newton Methods for General Convex Functions

TL;DR

Abstract

Advancing Multi-Secant Quasi-Newton Methods for General Convex Functions

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (37)