Efficient globalization of heavy-ball type methods for unconstrained optimization based on curve searches
Federica Donnini, Matteo Lapucci, Pierluigi Mansueto
TL;DR
This paper develops a curve-search globalization framework for unconstrained optimization, focusing on Polyak's heavy-ball updates. The method backtracks along a parametric search curve starting from the tentative update to enforce an Armijo-type decrease, enabling global convergence even in nonconvex settings. By analyzing quadratic search curves, it proves worst-case complexity $\mathcal{O}(\epsilon^{-2})$ to reach approximate stationarity and shows that heavy-ball steps can be retained in strongly convex problems. Numerical experiments on strongly convex and CUTEst problems demonstrate practical efficiency gains over classical safeguards, with code available publicly.
Abstract
In this work, we deal with unconstrained nonlinear optimization problems. Specifically, we are interested in methods carrying out updates possibly along directions not of descent, like Polyak's heavy-ball algorithm. Instead of enforcing convergence properties through line searches and modifications of search direction when suitable safeguards are not satisfied, we propose a strategy based on searches along curve paths: a curve search starting from the first tentative update allows to smoothly revert towards a gradient-related direction if a sufficient decrease condition is not met. The resulting algorithm provably possesses global convergence guarantees, even with a nonmonotone decrease condition. While the presented framework is rather general, particularly of interest is the case of parabolic searches; in this case, under reasonable assumptions, the resulting algorithm can be shown to possess optimal worst case complexity bounds for reaching approximate stationarity in nonconvex settings. Practically, we show that the proposed globalization strategy allows to consistently accept (optimal) pure heavy-ball steps in the strongly convex case, while standard globalization approaches would at times negate them before even evaluating the objective function. Preliminary computational experiments also suggest that the proposed framework might be more convenient than classical safeguard based approaches.
