Table of Contents
Fetching ...

Fast Frank--Wolfe Algorithms with Adaptive Bregman Step-Size for Weakly Convex Functions

Shota Takahashi, Sebastian Pokutta, Akiko Takeda

TL;DR

Frank--Wolfe algorithms with an adaptive Bregman step-size strategy for smooth adaptable convex functions with convergence guarantees in various settings, including convergence rates ranging from sublinear to linear, depending on the assumptions for convex and nonconvex objective functions are proposed.

Abstract

We propose Frank--Wolfe (FW) algorithms with an adaptive Bregman step-size strategy for smooth adaptable (also called: relatively smooth) (weakly-) convex functions. This means that the gradient of the objective function is not necessarily Lipschitz continuous, and we only require the smooth adaptable property. Compared with existing FW algorithms, our assumptions are less restrictive. We establish convergence guarantees in various settings, including convergence rates ranging from sublinear to linear, depending on the assumptions for convex and nonconvex objective functions. Assuming that the objective function is weakly convex and satisfies the local quadratic growth condition, we provide both local sublinear and local linear convergence with respect to the primal gap. We also propose a variant of the away-step FW algorithm using Bregman distances over polytopes. We establish faster global convergence (up to a linear rate) for convex optimization under the Hölder error bound condition and local linear convergence for nonconvex optimization under the local quadratic growth condition. Numerical experiments demonstrate that our proposed FW algorithms outperform existing methods.

Fast Frank--Wolfe Algorithms with Adaptive Bregman Step-Size for Weakly Convex Functions

TL;DR

Frank--Wolfe algorithms with an adaptive Bregman step-size strategy for smooth adaptable convex functions with convergence guarantees in various settings, including convergence rates ranging from sublinear to linear, depending on the assumptions for convex and nonconvex objective functions are proposed.

Abstract

We propose Frank--Wolfe (FW) algorithms with an adaptive Bregman step-size strategy for smooth adaptable (also called: relatively smooth) (weakly-) convex functions. This means that the gradient of the objective function is not necessarily Lipschitz continuous, and we only require the smooth adaptable property. Compared with existing FW algorithms, our assumptions are less restrictive. We establish convergence guarantees in various settings, including convergence rates ranging from sublinear to linear, depending on the assumptions for convex and nonconvex objective functions. Assuming that the objective function is weakly convex and satisfies the local quadratic growth condition, we provide both local sublinear and local linear convergence with respect to the primal gap. We also propose a variant of the away-step FW algorithm using Bregman distances over polytopes. We establish faster global convergence (up to a linear rate) for convex optimization under the Hölder error bound condition and local linear convergence for nonconvex optimization under the local quadratic growth condition. Numerical experiments demonstrate that our proposed FW algorithms outperform existing methods.

Paper Structure

This paper contains 44 sections, 27 theorems, 110 equations, 12 figures, 8 tables, 3 algorithms.

Key Result

Theorem 3.3

Let $L_{-1}$ be the initial $L$-smad estimate and $n_t$ be the total number of evaluations of Equation ineq:primal-progress up to iteration $t$. Then we have $n_t \leq \max\{(1 - \log\eta/\log\tau)(t+1) + \max\{\log(\tau L/L_{-1}), 0\}/\log\tau, (1 + \log\nu/\log\beta)(t+1)\}$.

Figures (12)

  • Figure 1: Log plot of primal and FW gaps on $\ell_p$ loss for gas sensor data with $b_{\max} = 130$.
  • Figure 2: Log plot of primal and FW gaps on $\ell_p$ loss for gas sensor data with $b_{\max} = 200$.
  • Figure 3: Log plot of primal and FW gaps on phase retrieval for $(m, n) = (1000,10000)$.
  • Figure 4: Log plot of primal and FW gaps on phase retrieval for $(m, n) = (2000,10000)$.
  • Figure 5: Log-log plot of primal and FW gaps on nonnegative linear inverse problem for $(m, n) = (100,1000)$.
  • ...and 7 more figures

Theorems & Definitions (65)

  • Definition 2.1: Kernel Generating Distance Bolte2018-zt
  • Definition 2.2: Bregman Distance Bregman1967-rf
  • Definition 2.3: $L$-smooth Adaptable Property
  • Definition 2.4: Hölder Error Bound Braun2025-szalex17sharp and Quadratic Growth Conditions Garber2015-deGarber2020-trLiao2024-vd
  • Remark 3.2: Well-definedness and termination of Algorithm \ref{['alg:adaptive-bregman']}
  • Theorem 3.3
  • Theorem 4.2: Linear convergence of FW algorithm with short step-size or adaptive step-size
  • Theorem 4.4: Linear convergence of the away-step FW algorithm
  • Remark 5.2
  • Theorem 5.3: Local linear convergence of FW algorithm with short step-size or adaptive step-size
  • ...and 55 more