Table of Contents
Fetching ...

Scalable Frank-Wolfe on Generalized Self-concordant Functions via Simple Steps

Alejandro Carderera, Mathieu Besançon, Sebastian Pokutta

TL;DR

This work provides a projection-free Frank-Wolfe framework for generalized self-concordant objectives, proving that a simple open-loop step $\gamma_t=2/(t+2)$ yields $\mathcal{O}(1/t)$ convergence for both the primal and Frank-Wolfe gaps without line searches or higher-order information. It further shows accelerated rates in practical settings: linear convergence when the optimum lies in the interior and when the feasible set is uniformly or strongly convex, with uniform convexity yielding rates that depend on the degree $q$, and polytopes enabling linear rates for AFW and BPCG with backtracking. Comprehensive experiments on portfolio optimization, logistic regression, and the Birkhoff polytope demonstrate the method’s competitive performance and numerical robustness, including in ill-conditioned scenarios, while a stateless variant offers a simpler update rule with comparable guarantees. Overall, the paper advances scalable, parameter-free, projection-free optimization for a broad class of self-concordant-like objectives and clarifies the practical implications of step-size strategies and geometric assumptions on convergence rates.

Abstract

Generalized self-concordance is a key property present in the objective function of many important learning problems. We establish the convergence rate of a simple Frank-Wolfe variant that uses the open-loop step size strategy $γ_t = 2/(t+2)$, obtaining a $\mathcal{O}(1/t)$ convergence rate for this class of functions in terms of primal gap and Frank-Wolfe gap, where $t$ is the iteration count. This avoids the use of second-order information or the need to estimate local smoothness parameters of previous work. We also show improved convergence rates for various common cases, e.g., when the feasible region under consideration is uniformly convex or polyhedral.

Scalable Frank-Wolfe on Generalized Self-concordant Functions via Simple Steps

TL;DR

This work provides a projection-free Frank-Wolfe framework for generalized self-concordant objectives, proving that a simple open-loop step yields convergence for both the primal and Frank-Wolfe gaps without line searches or higher-order information. It further shows accelerated rates in practical settings: linear convergence when the optimum lies in the interior and when the feasible set is uniformly or strongly convex, with uniform convexity yielding rates that depend on the degree , and polytopes enabling linear rates for AFW and BPCG with backtracking. Comprehensive experiments on portfolio optimization, logistic regression, and the Birkhoff polytope demonstrate the method’s competitive performance and numerical robustness, including in ill-conditioned scenarios, while a stateless variant offers a simpler update rule with comparable guarantees. Overall, the paper advances scalable, parameter-free, projection-free optimization for a broad class of self-concordant-like objectives and clarifies the practical implications of step-size strategies and geometric assumptions on convergence rates.

Abstract

Generalized self-concordance is a key property present in the objective function of many important learning problems. We establish the convergence rate of a simple Frank-Wolfe variant that uses the open-loop step size strategy , obtaining a convergence rate for this class of functions in terms of primal gap and Frank-Wolfe gap, where is the iteration count. This avoids the use of second-order information or the need to estimate local smoothness parameters of previous work. We also show improved convergence rates for various common cases, e.g., when the feasible region under consideration is uniformly convex or polyhedral.

Paper Structure

This paper contains 15 sections, 14 theorems, 66 equations, 7 figures, 4 tables, 7 algorithms.

Key Result

Proposition 2.1

Given a $\left(M, \nu\right)$ generalized self-concordant function, then for $\nu\geq 2$, we have that: where the inequality holds if and only if $d_{\nu}(\mathbf{x}, \mathbf{y}) < 1$ for $\nu > 2$, and we have that, where:

Figures (7)

  • Figure 1: Minimizing $f(\mathbf{x})$ over $\mathcal{P} \cap \mathcal{C}$, versus minimizing the sum of $f(\mathbf{x})$ and $\Phi_{\mathcal{C}}(\mathbf{x})$ over $\mathcal{P}$ for two different penalty values $\mu'$ and $\mu$ such that $\mu' \gg \mu$.
  • Figure 2: Portfolio optimization: Convergence of $h(\mathbf{x}_t)$ and $g(\mathbf{x}_t)$ vs. $t$ and wall-clock time. $n=1000$.
  • Figure 3: Portfolio optimization: Convergence of $h(\mathbf{x}_t)$ and $g(\mathbf{x}_t)$ vs. $t$ and wall-clock time.
  • Figure 4: Logistic regression: Convergence of $h(\mathbf{x}_t)$ and $g(\mathbf{x}_t)$ vs. $t$ and wall-clock time for instances of the LIBSVM dataset.
  • Figure 5: Birkhoff polytope: Convergence of $h(\mathbf{x}_t)$ and $g(\mathbf{x}_t)$ vs. $t$ and wall-clock time on a2a: $(N, n) = (2265, 114)$.
  • ...and 2 more figures

Theorems & Definitions (27)

  • Remark 1.1
  • Example 1.2: Intersection of a convex set with a polytope
  • Definition 1.3: Generalized self-concordant function
  • Proposition 2.1: Proposition 10, sun_generalized_2019
  • Remark 2.2
  • Lemma 2.3: Proposition 7,sun_generalized_2019
  • Theorem 2.4
  • Remark 2.5
  • Theorem 2.6
  • Remark 2.7
  • ...and 17 more