On the convergence of proximal gradient methods for convex simple bilevel optimization

Puya Latafat; Andreas Themelis; Silvia Villa; Panagiotis Patrinos

On the convergence of proximal gradient methods for convex simple bilevel optimization

Puya Latafat, Andreas Themelis, Silvia Villa, Panagiotis Patrinos

TL;DR

An adaptive linesearch method is developed that introduces a systematic adaptive scheme enabling large and nonmonotonic stepsize sequences while being insensitive to the choice of hyperparameters and initialization.

Abstract

This paper studies proximal gradient iterations for solving simple bilevel optimization problems where both the upper and the lower level cost functions are split as the sum of differentiable and (possibly nonsmooth) proximable functions. We develop a novel convergence recipe for iteration varying stepsizes that relies on Barzilai-Borwein type local estimates for the differentiable terms. Leveraging the convergence recipe, under global Lipschitz gradient continuity, we establish convergence for a nonadaptive stepsize sequence, without requiring any strong convexity or linesearch. In the locally Lipschitz differentiable setting, we develop an adaptive linesearch method that introduces a systematic adaptive scheme enabling large and nonmonotonic stepsize sequences while being insensitive to the choice of hyperparameters and initialization. Numerical simulations are provided showcasing favorable convergence speed of our methods.

On the convergence of proximal gradient methods for convex simple bilevel optimization

TL;DR

Abstract

Paper Structure (23 sections, 6 theorems, 66 equations, 4 figures, 1 table, 2 algorithms)

This paper contains 23 sections, 6 theorems, 66 equations, 4 figures, 1 table, 2 algorithms.

Introduction
Contributions
Organization
Preliminaries
Problem setup and proposed algorithms
The globally Lipschitz case: StaBiM
The locally Lipschitz case: AdaBiM
Observations about the stepsizes
Convergence analysis
A quasi-descent inequality
Convergence recipe for proximal gradient iterations
Simulations
Compared algorithms
Solodov's explicit descent method ( SEDM-r)
Bilevel gradient sequential averaging method ( BiGSAM)
...and 8 more sections

Key Result

theorem 1

Additionally to ass:basic, suppose that $\nabla \@nameuse{f@sub} _i$ are globally $L_{ \@nameuse{f@sub} _i}$-Lipschitz continuous, $i=1,2$, and that $\seq{\@nameuse{@sigk}}$ complies with eq:sigk. Then ($\seq{x_k}$ is bounded and) $\seq{\dist(x_k, \@nameuse{X@sub} _1)}$ conv where $M_{\rm max} = 1+ \nu+\nu\sigma_0L_{ \@nameuse{f@sub} _1}/L_{ \@nameuse{f@sub}

Figures (4)

Figure 1: A representative plot showing cumulative number of backtracks needed by \ref{['alg:adabim']} and SEDM (top row) and stepsize magnitudes in a window of 100 iterations (bottom row) in sample simulations from \ref{['sec:num']}. The numerical suffixes -1, -10, and -100 in SEDM indicate different choices for the value of $\widehat{\alpha}_0$ as defined in \ref{['sec:solodov']}. Left: logistic regression (a5a dataset); center: linear inverse problem; right: solution of integral equations.
Figure 2: Logistic regression problems of \ref{['sec:logreg']} with minimum $\ell^p$-norm solution, $p=1,2$.
Figure 3: Linear inverse problems of \ref{['sec:linverse']} with minimum $\ell^p$-norm solution, $p=1,2$.
Figure 4: Solution of integral equations

Theorems & Definitions (13)

remark 1: prox-friendliness of $\@nameuse{@gk}$
remark 2: Nonlinear programs (NLPs)
theorem 1: convergence of \ref{['alg:stabim']}
theorem 2: convergence of \ref{['alg:adabim']}
remark 3: Bar notation for minima and shifted costs
lemma 1: quasi-descent inequality
proof
lemma 2
proof
theorem 3: convergence recipe for proximal gradient iterations
...and 3 more

On the convergence of proximal gradient methods for convex simple bilevel optimization

TL;DR

Abstract

On the convergence of proximal gradient methods for convex simple bilevel optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (13)