Table of Contents
Fetching ...

On the convergence of proximal gradient methods for convex simple bilevel optimization

Puya Latafat, Andreas Themelis, Silvia Villa, Panagiotis Patrinos

TL;DR

An adaptive linesearch method is developed that introduces a systematic adaptive scheme enabling large and nonmonotonic stepsize sequences while being insensitive to the choice of hyperparameters and initialization.

Abstract

This paper studies proximal gradient iterations for solving simple bilevel optimization problems where both the upper and the lower level cost functions are split as the sum of differentiable and (possibly nonsmooth) proximable functions. We develop a novel convergence recipe for iteration varying stepsizes that relies on Barzilai-Borwein type local estimates for the differentiable terms. Leveraging the convergence recipe, under global Lipschitz gradient continuity, we establish convergence for a nonadaptive stepsize sequence, without requiring any strong convexity or linesearch. In the locally Lipschitz differentiable setting, we develop an adaptive linesearch method that introduces a systematic adaptive scheme enabling large and nonmonotonic stepsize sequences while being insensitive to the choice of hyperparameters and initialization. Numerical simulations are provided showcasing favorable convergence speed of our methods.

On the convergence of proximal gradient methods for convex simple bilevel optimization

TL;DR

An adaptive linesearch method is developed that introduces a systematic adaptive scheme enabling large and nonmonotonic stepsize sequences while being insensitive to the choice of hyperparameters and initialization.

Abstract

This paper studies proximal gradient iterations for solving simple bilevel optimization problems where both the upper and the lower level cost functions are split as the sum of differentiable and (possibly nonsmooth) proximable functions. We develop a novel convergence recipe for iteration varying stepsizes that relies on Barzilai-Borwein type local estimates for the differentiable terms. Leveraging the convergence recipe, under global Lipschitz gradient continuity, we establish convergence for a nonadaptive stepsize sequence, without requiring any strong convexity or linesearch. In the locally Lipschitz differentiable setting, we develop an adaptive linesearch method that introduces a systematic adaptive scheme enabling large and nonmonotonic stepsize sequences while being insensitive to the choice of hyperparameters and initialization. Numerical simulations are provided showcasing favorable convergence speed of our methods.
Paper Structure (23 sections, 6 theorems, 66 equations, 4 figures, 1 table, 2 algorithms)

This paper contains 23 sections, 6 theorems, 66 equations, 4 figures, 1 table, 2 algorithms.

Key Result

theorem 1

Additionally to ass:basic, suppose that $\nabla \@nameuse{f@sub} _i$ are globally $L_{ \@nameuse{f@sub} _i}$-Lipschitz continuous, $i=1,2$, and that $\seq{\@nameuse{@sigk}}$ complies with eq:sigk. Then ($\seq{x_k}$ is bounded and) $\seq{\dist(x_k, \@nameuse{X@sub} _1)}$ conv where $M_{\rm max} = 1+ \nu+\nu\sigma_0L_{ \@nameuse{f@sub} _1}/L_{ \@nameuse{f@sub}

Figures (4)

  • Figure 1: A representative plot showing cumulative number of backtracks needed by \ref{['alg:adabim']} and SEDM (top row) and stepsize magnitudes in a window of 100 iterations (bottom row) in sample simulations from \ref{['sec:num']}. The numerical suffixes -1, -10, and -100 in SEDM indicate different choices for the value of $\widehat{\alpha}_0$ as defined in \ref{['sec:solodov']}. Left: logistic regression (a5a dataset); center: linear inverse problem; right: solution of integral equations.
  • Figure 2: Logistic regression problems of \ref{['sec:logreg']} with minimum $\ell^p$-norm solution, $p=1,2$.
  • Figure 3: Linear inverse problems of \ref{['sec:linverse']} with minimum $\ell^p$-norm solution, $p=1,2$.
  • Figure 4: Solution of integral equations

Theorems & Definitions (13)

  • remark 1: prox-friendliness of $\@nameuse{@gk}$
  • remark 2: Nonlinear programs (NLPs)
  • theorem 1: convergence of \ref{['alg:stabim']}
  • theorem 2: convergence of \ref{['alg:adabim']}
  • remark 3: Bar notation for minima and shifted costs
  • lemma 1: quasi-descent inequality
  • proof
  • lemma 2
  • proof
  • theorem 3: convergence recipe for proximal gradient iterations
  • ...and 3 more