Table of Contents
Fetching ...

Fast convergence of Frank-Wolfe algorithms on polytopes

Elias Wirth, Javier Pena, Sebastian Pokutta

TL;DR

This paper develops an affine-invariant template for deriving convergence rates of Frank-Wolfe variants on polytopes, relying on two core properties: extended curvature and Hölderian error bound. The framework yields rates that interpolate between sublinear and linear, controlled by the exponent $\theta \in (0,\tfrac{1}{2}]$, and it applies uniformly to vanilla FW, away-step FW, blended pairwise FW, and in-face FW. Convergence guarantees are expressed via problem-distance measures tied to geometry: radial distance $\mathfrak{r}$ for vanilla FW, vertex distance $\mathfrak{v}$ for AFW/BPFW, and face distance $\mathfrak{f}$ for IFW, with dimension-dependent refinements for standard-form polytopes and simplex-like polytopes. A key insight is that local facial geometry drives the error bounds, enabling dimension-independent rates and sharpening comparisons to global-width-based analyses, while recovering and extending prior results for simplex-like structures.

Abstract

We provide a template to derive convergence rates for the following popular versions of the Frank-Wolfe algorithm on polytopes: vanilla Frank-Wolfe, Frank-Wolfe with away steps, Frank-Wolfe with blended pairwise steps, and Frank-Wolfe with in-face directions. Our template shows how the convergence rates follow from two affine-invariant properties of the problem, namely, error bound and extended curvature. These properties depend solely on the polytope and objective function but not on any affine-dependent object like norms. For each one of the above algorithms, we derive rates of convergence ranging from sublinear to linear depending on the degree of the error bound.

Fast convergence of Frank-Wolfe algorithms on polytopes

TL;DR

This paper develops an affine-invariant template for deriving convergence rates of Frank-Wolfe variants on polytopes, relying on two core properties: extended curvature and Hölderian error bound. The framework yields rates that interpolate between sublinear and linear, controlled by the exponent , and it applies uniformly to vanilla FW, away-step FW, blended pairwise FW, and in-face FW. Convergence guarantees are expressed via problem-distance measures tied to geometry: radial distance for vanilla FW, vertex distance for AFW/BPFW, and face distance for IFW, with dimension-dependent refinements for standard-form polytopes and simplex-like polytopes. A key insight is that local facial geometry drives the error bounds, enabling dimension-independent rates and sharpening comparisons to global-width-based analyses, while recovering and extending prior results for simplex-like structures.

Abstract

We provide a template to derive convergence rates for the following popular versions of the Frank-Wolfe algorithm on polytopes: vanilla Frank-Wolfe, Frank-Wolfe with away steps, Frank-Wolfe with blended pairwise steps, and Frank-Wolfe with in-face directions. Our template shows how the convergence rates follow from two affine-invariant properties of the problem, namely, error bound and extended curvature. These properties depend solely on the polytope and objective function but not on any affine-dependent object like norms. For each one of the above algorithms, we derive rates of convergence ranging from sublinear to linear depending on the degree of the error bound.

Paper Structure

This paper contains 13 sections, 19 theorems, 107 equations, 6 figures, 1 table, 4 algorithms.

Key Result

Lemma 2.6

Suppose that $p > 0$ and $(\beta_t)_{t\in\mathbb{N}}, (\sigma_t)_{t\in\mathbb{N}}$ are such that $\beta_t, \sigma_t \geq 0$ and $\beta_{t+1}\leq (1-\sigma_t\beta_t^p)\beta_t$ for $t \in \mathbb{N}$. Then, for all $t\in \mathbb{N}$, it holds that

Figures (6)

  • Figure 1: Inner facial distance $\Phi(F,\mathcal{C})$ equals the length of the dotted line.
  • Figure 2: Outer facial distance $\bar{\Phi}(F,\mathcal{C})$ equals the length of the dotted line(s).
  • Figure 3: Visualization of the function $\mathfrak{r}(\cdot,{\mathbf{x}})$ on $\mathcal{C}$ for the radial distance $\mathfrak{r}$ of $\mathcal{C}$. The shade of gray indicates the function value. Points ${\mathbf{y}}\in \mathcal{C}$ with $\mathfrak{r}({\mathbf{y}},{\mathbf{x}}) = 0$ are in white and points ${\mathbf{y}}\in \mathcal{C}$ with $\mathfrak{r}({\mathbf{y}},{\mathbf{x}})=1$ are in black.
  • Figure 4: Visualization of the function $\mathfrak{r}({\mathbf{y}},\cdot)$ on $\mathcal{C}$ for the radial distance $\mathfrak{r}$ of $\mathcal{C}$. The shade of gray indicates the function value. Points ${\mathbf{x}}\in \mathcal{C}$ with $\mathfrak{r}({\mathbf{y}},{\mathbf{x}}) = 0$ are in white and points ${\mathbf{x}}\in \mathcal{C}$ with $\mathfrak{r}({\mathbf{y}},{\mathbf{x}})=1$ are in black.
  • Figure 5: Visualization of the function $\mathfrak{v}({\mathbf{y}},\cdot)$ on $\mathcal{C}$ for the vertex distance $\mathfrak{v}$ of $\mathcal{C}$. The shade of gray indicates the function value. Points ${\mathbf{x}}\in\mathcal{C}$ with $\mathfrak{v}({\mathbf{y}},{\mathbf{x}}) = 0$ are in white and points ${\mathbf{x}}\in\mathcal{C}$ with $\mathfrak{v}({\mathbf{y}},{\mathbf{x}})=1$ are in black.
  • ...and 1 more figures

Theorems & Definitions (44)

  • Definition 2.1: Smoothness
  • Definition 2.2: Hölderian error bound
  • Definition 2.3: Curvature
  • Definition 2.4: Error bound
  • Definition 2.5: Inner and outer facial distances
  • Lemma 2.6
  • Definition 3.1: Radial distance
  • Theorem 3.2: FW
  • Lemma 3.3
  • proof
  • ...and 34 more