Table of Contents
Fetching ...

Iteration Complexity of Frank-Wolfe and Its Variants for Bilevel Optimization

Anthony Palmieri, Francesco Rinaldi, Saverio Salzo, Sara Venturini

TL;DR

Overall iteration complexity guarantees for bilevel Frank-Wolfe methods for constrained bilevel optimization are derived by combining recent bounds on hypergradient errors from iterative and approximate implicit differentiation.

Abstract

We study Frank-Wolfe (FW) methods for constrained bilevel optimization when the lower-level problem is solved only approximately, yielding biased and inexact hypergradients. We analyze inexact variants of vanilla FW as well as away-step and pairwise FW, and provide convergence rates in the nonconvex setting under gradient errors. By combining these results with recent bounds on hypergradient errors from iterative and approximate implicit differentiation, we derive overall iteration complexity guarantees for bilevel FW. Experiments on two real-world applications validate the theory and demonstrate practical effectiveness.

Iteration Complexity of Frank-Wolfe and Its Variants for Bilevel Optimization

TL;DR

Overall iteration complexity guarantees for bilevel Frank-Wolfe methods for constrained bilevel optimization are derived by combining recent bounds on hypergradient errors from iterative and approximate implicit differentiation.

Abstract

We study Frank-Wolfe (FW) methods for constrained bilevel optimization when the lower-level problem is solved only approximately, yielding biased and inexact hypergradients. We analyze inexact variants of vanilla FW as well as away-step and pairwise FW, and provide convergence rates in the nonconvex setting under gradient errors. By combining these results with recent bounds on hypergradient errors from iterative and approximate implicit differentiation, we derive overall iteration complexity guarantees for bilevel FW. Experiments on two real-world applications validate the theory and demonstrate practical effectiveness.
Paper Structure (16 sections, 4 theorems, 94 equations, 1 figure, 1 table, 3 algorithms)

This paper contains 16 sections, 4 theorems, 94 equations, 1 figure, 1 table, 3 algorithms.

Key Result

Theorem 3.1

Under Assumptions assumptionbasic and assumption, suppose that in Algorithm alg:final_FW the step size $\eta_n$ satisfies the following conditions with some fixed $\rho > 0$. Then, Algorithm alg:final_FW terminates after a finite number of iterations, specifically in at most where Moreover, for the best FW gap we have

Figures (1)

  • Figure 1: Frank--Wolfe gap (log scale) versus outer iterations for FW, Away-Step FW (ASFW), and Pairwise FW (PWFW): (a) multilayer semi-supervised learning on synthetic datasets; (b) data distillation on the real BLOG dataset.

Theorems & Definitions (12)

  • Theorem 3.1
  • Remark 3.2
  • Lemma 3.3
  • Remark 3.3
  • Remark 3.4
  • Theorem 3.5
  • Remark 3.6
  • Theorem 3.7
  • proof : Proof of \ref{['nonconvb']}
  • proof : Proof of \ref{['nonconvb-away']}
  • ...and 2 more