Blended Conditional Gradients: the unconditioning of conditional gradients
Gábor Braun, Sebastian Pokutta, Dan Tu, Stephen Wright
TL;DR
The paper introduces Blended Conditional Gradients (BCG), a projection-free optimization method for minimizing smooth convex functions over polytopes by blending Frank–Wolfe steps with simplex-based gradient descent. It leverages a weak-separation oracle and a simplex descent oracle to navigate the active vertex set efficiently, achieving linear convergence for strongly convex objectives via a simplicial-curvature–geometric-strong-convexity framework. Theoretical results show $f(x_T)-f(x^*) o 0$ at a rate $Oigl(rac{C^ riangle}{oldsymbol{ extmu}}oldsymbol{ extlog}(rac{oldsymbol{ ext Phi_0}}{oldsymbol{ ext eps}})igr)$ and practical experiments across Lasso, video co-localization, structured regression, matrix completion, and sparse recovery demonstrate substantial speedups and sparser solutions compared to standard CG variants. The work emphasizes projection-free operation, sparse representations, and lazy oracle evaluation, with additional simplex-specific variants and enhancements improving real-world performance and suggesting further extensions to broader growth conditions and acceleration strategies.
Abstract
We present a blended conditional gradient approach for minimizing a smooth convex function over a polytope P, combining the Frank--Wolfe algorithm (also called conditional gradient) with gradient-based steps, different from away steps and pairwise steps, but still achieving linear convergence for strongly convex functions, along with good practical performance. Our approach retains all favorable properties of conditional gradient algorithms, notably avoidance of projections onto P and maintenance of iterates as sparse convex combinations of a limited number of extreme points of P. The algorithm is lazy, making use of inexpensive inexact solutions of the linear programming subproblem that characterizes the conditional gradient approach. It decreases measures of optimality (primal and dual gaps) rapidly, both in the number of iterations and in wall-clock time, outperforming even the lazy conditional gradient algorithms of [arXiv:1410.8816]. We also present a streamlined version of the algorithm for the probability simplex.
