Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features
Aleksandr Beznosikov, David Dobre, Gauthier Gidel
TL;DR
This work introduces two stochastic Frank–Wolfe variants for constrained finite-sum minimization: Sarah Frank–Wolfe (fw_sarah) and Saga Sarah Frank–Wolfe (fw_zerosarah). They achieve state-of-the-art convergence guarantees for both convex and non-convex objectives while avoiding large-batch strategies and full-gradient computations when possible, leveraging variance-reduction techniques (SARAH, SAGA) within a projection-free, LMO-enabled framework. The analysis uses a Lyapunov quantity that tracks gradient-estimator accuracy, yielding explicit rates and optimal parameter choices (e.g., $p$ and $b$) and demonstrating favorable LMO and stochastic oracle complexities. Empirical results on LibSVM datasets validate the theory, showing competitive or superior performance with respect to existing projection-free baselines, and the work outlines directions for extending the approach to strongly convex settings and distributed settings with compression.
Abstract
The Frank-Wolfe (FW) method is a popular approach for solving optimization problems with structured constraints that arise in machine learning applications. In recent years, stochastic versions of FW have gained popularity, motivated by large datasets for which the computation of the full gradient is prohibitively expensive. In this paper, we present two new variants of the FW algorithms for stochastic finite-sum minimization. Our algorithms have the best convergence guarantees of existing stochastic FW approaches for both convex and non-convex objective functions. Our methods do not have the issue of permanently collecting large batches, which is common to many stochastic projection-free approaches. Moreover, our second approach does not require either large batches or full deterministic gradients, which is a typical weakness of many techniques for finite-sum problems. The faster theoretical rates of our approaches are confirmed experimentally.
