Table of Contents
Fetching ...

Accelerating Diagonal Methods for Bilevel Optimization: Unified Convergence via Continuous-Time Dynamics

Radu Ioan Boţ, Enis Chenchene, Ernö Robert Csetnek, David Alexander Hulett

TL;DR

This work addresses efficient solution of bilevel programs where the lower problem decouples from the upper variable, by developing diagonal (Tikhonov) methods that discretize continuous-time dynamics. Two algorithms are analyzed: a proximal-gradient method (first-order) and a fast proximal-gradient method with Nesterov momentum (second-order), each with a polynomially decaying regularization ε_k = c/(k+β)^δ. A unified Lyapunov-based framework yields explicit convergence rates under Hölderian growth or the Attouch–Czarnecki condition and establishes weak convergence to bilevel solutions in infinite-dimensional Hilbert spaces. The results extend prior work with general geometric assumptions, enable flexible parameter schedules, and are supported by numerical experiments on linear inverse problems and logistic regression. The continuous-time analysis provides a principled basis for the discrete schemes and highlights the trade-offs between geometry, regularization decay, and acceleration.

Abstract

We analyze fast diagonal methods for simple bilevel programs. Guided by the analysis of the corresponding continuous-time dynamics, we provide a unified convergence analysis under general geometric conditions, including Hölderian growth and the Attouch-Czarnecki condition. Our results yield explicit convergence rates and guarantee weak convergence to a solution of the bilevel problem. In particular, we improve and extend recent results on accelerated schemes, offering novel insights into the trade-offs between geometry, regularization decay, and algorithmic design. Numerical experiments illustrate the advantages of more flexible methods and support our theoretical findings.

Accelerating Diagonal Methods for Bilevel Optimization: Unified Convergence via Continuous-Time Dynamics

TL;DR

This work addresses efficient solution of bilevel programs where the lower problem decouples from the upper variable, by developing diagonal (Tikhonov) methods that discretize continuous-time dynamics. Two algorithms are analyzed: a proximal-gradient method (first-order) and a fast proximal-gradient method with Nesterov momentum (second-order), each with a polynomially decaying regularization ε_k = c/(k+β)^δ. A unified Lyapunov-based framework yields explicit convergence rates under Hölderian growth or the Attouch–Czarnecki condition and establishes weak convergence to bilevel solutions in infinite-dimensional Hilbert spaces. The results extend prior work with general geometric assumptions, enable flexible parameter schedules, and are supported by numerical experiments on linear inverse problems and logistic regression. The continuous-time analysis provides a principled basis for the discrete schemes and highlights the trade-offs between geometry, regularization decay, and acceleration.

Abstract

We analyze fast diagonal methods for simple bilevel programs. Guided by the analysis of the corresponding continuous-time dynamics, we provide a unified convergence analysis under general geometric conditions, including Hölderian growth and the Attouch-Czarnecki condition. Our results yield explicit convergence rates and guarantee weak convergence to a solution of the bilevel problem. In particular, we improve and extend recent results on accelerated schemes, offering novel insights into the trade-offs between geometry, regularization decay, and algorithmic design. Numerical experiments illustrate the advantages of more flexible methods and support our theoretical findings.

Paper Structure

This paper contains 22 sections, 14 theorems, 124 equations, 4 figures, 2 algorithms.

Key Result

Lemma 2.1

Let $(x_k)_{k \geq 0}$ be the sequence generated by Algorithm alg:first_order and $(E_k^\lambda)_{k \geq 1}$ be defined as in eq:lyapunov_first_order. Then, there exists $k_0 \geq 1$ such that for all $k \geq k_0$ where $(\zeta_{k, \delta})_{k \geq 1}$ is a sequence defined by

Figures (4)

  • Figure 1: Geometric setting considered in this paper, where $\rho^*$ is the dual exponent of $\rho$, see \ref{['eq:def_rho_star']}. For a discussion of results obtained under even milder assumptions, see Section \ref{['sec:discussion']}.
  • Figure 2: Result of the experiment in Subsection \ref{['sec:num_nem']}: Comparing inner and outer function residuals as well as distance to solution for the methods listed in Subsection \ref{['sec:num_compared_algorithms']}.
  • Figure 3: Results of experiment in Section \ref{['sec:num_nem']}: Comparing residual decrease for Algorithm \ref{['alg:second_order']} as a function of the iteration number emphasizing the choice of $\delta \in (1, 2)$.
  • Figure 4: Results of experiment in Subsection \ref{['sec:num_logistic']}. The dotted lines in Figure \ref{['fig:logistic_obj_inner']} represent the rates $\mathcal{O}(k^{-\delta / 2})$ and $\mathcal{O}(k^{-\delta})$, with $\delta=1.9$.

Theorems & Definitions (24)

  • Lemma 2.1
  • proof
  • Remark 2.2
  • Theorem 2.3
  • proof
  • Lemma 3.1
  • proof
  • Remark 3.2
  • Theorem 3.3
  • proof
  • ...and 14 more