Adaptive Conditional Gradient Descent
Abbas Khademi, Antonio Silveti-Falls
TL;DR
This work tackles the challenge of step-size selection for projectionfree firstorder optimization with linear minimization oracles, unifying Conditional Gradient and NonEuclidean Normalized Steepest Descent. It introduces Adaptive Conditional Gradient Descent, which uses a gradientdifferencebased local Lipschitz estimate, warmstarts the backtracking, and offers two scaling variants to adapt to local curvature. The authors provide convergence guarantees across nonconvex, quasarconvex, and strongly convex settings and validate the approach through extensive experiments on constrained and unconstrained problems, showing faster convergence and competitive or superior solution quality. The method is practical, versatile across geometries, and particularly valuable for large scale, geometryaware optimization where global Lipschitz constants are unknown or misleading.
Abstract
Selecting an effective step-size is a fundamental challenge in first-order optimization, especially for problems with non-Euclidean geometries. This paper presents a novel adaptive step-size strategy for optimization algorithms that rely on linear minimization oracles, as used in the Conditional Gradient or non-Euclidean Normalized Steepest Descent algorithms. Using a simple heuristic to estimate a local Lipschitz constant for the gradient, we can determine step-sizes that guarantee sufficient decrease at each iteration. More precisely, we establish convergence guarantees for our proposed Adaptive Conditional Gradient Descent algorithm, which covers as special cases both the classical Conditional Gradient algorithm and non-Euclidean Normalized Steepest Descent algorithms with adaptive step-sizes. Our analysis covers optimization of continuously differentiable functions in non-convex, quasar-convex, and strongly convex settings, achieving convergence rates that match state-of-the-art theoretical bounds. Comprehensive numerical experiments validate our theoretical findings and illustrate the practical effectiveness of Adaptive Conditional Gradient Descent. The results exhibit competitive performance, underscoring the potential of the adaptive step-size for applications.
