A Study of Condition Numbers for First-Order Optimization
Charles Guille-Escuret, Baptiste Goujaud, Manuela Girotti, Ioannis Mitliagkas
TL;DR
The paper argues that classical tuning of first-order optimization algorithms based on $L$-smoothness and $\mu$-strong convexity is brittle under small perturbations of the objective. It introduces a star-norm $\|\cdot\|_*$ to measure perturbation impact and establishes a continuity framework for FOA behavior under these perturbations, showing that FOA trajectories are robust for small star-norm changes while traditional metrics are not. To build robust tuning, it defines continuous upper and lower condition families and demonstrates their relationships, including convergence guarantees for Gradient Descent across various condition-pair combinations. It then advocates for using these continuous conditions (and their implications for FOA tuning and convergence) in practice, illustrated by potential improvements in HB behavior on challenging functions and connections to logistic regression settings.
Abstract
The study of first-order optimization algorithms (FOA) typically starts with assumptions on the objective functions, most commonly smoothness and strong convexity. These metrics are used to tune the hyperparameters of FOA. We introduce a class of perturbations quantified via a new norm, called *-norm. We show that adding a small perturbation to the objective function has an equivalently small impact on the behavior of any FOA, which suggests that it should have a minor impact on the tuning of the algorithm. However, we show that smoothness and strong convexity can be heavily impacted by arbitrarily small perturbations, leading to excessively conservative tunings and convergence issues. In view of these observations, we propose a notion of continuity of the metrics, which is essential for a robust tuning strategy. Since smoothness and strong convexity are not continuous, we propose a comprehensive study of existing alternative metrics which we prove to be continuous. We describe their mutual relations and provide their guaranteed convergence rates for the Gradient Descent algorithm accordingly tuned. Finally we discuss how our work impacts the theoretical understanding of FOA and their performances.
