Table of Contents
Fetching ...

On the Convergence Rates of Iterative Regularization Algorithms for Composite Bi-Level Optimization

Shimrit Shtern, Adeolu Taiwo

TL;DR

The paper tackles bi-level optimization where both inner and outer objectives are composite convex. It develops two proximal-gradient–based schemes, IRE-PG and its accelerated variant IRE-APG, to obtain simultaneous convergence rates for the inner and outer functions, showing a tunable trade-off via a regularization sequence $\sigma_k$. When proximal computations are intractable, a surrogate lifting is introduced, accompanied by a rate-translation analysis that quantifies how surrogate performance maps to the original problem and how acceleration gains may degrade under translation. Empirically, the methods exhibit the predicted ergodic rates and illustrate the interplay between inner/outer convergence across $\beta$ values and surrogate settings. Overall, the work advances understanding of rate-optimal iterative regularization for composite bi-level problems and highlights both the potential and limits of acceleration in this context.

Abstract

This paper investigates iterative methods for solving bi-level optimization problems where both inner and outer functions have a composite structure. We establish novel theoretical results, including the first analysis that provides simultaneous convergence rates for the Iteratively REgularized Proximal Gradient (IRE-PG) method, a variant of Solodov's algorithm. These rates for the inner and outer functions highlight the inherent trade-offs between their respective convergence behaviors. We further extend this analysis to an accelerated version of IRE-PG, proving faster convergence rates under specific settings. Additionally, we propose a new scheme for handling cases where these methods cannot be directly applied to the bi-level problem due to the difficulty of computing the associated proximal operator. This scheme offers surrogate functions to approximate the original problem and a framework to translate convergence rates between the surrogate and original functions. Our results show that the accelerated method's advantage diminishes under this translation.

On the Convergence Rates of Iterative Regularization Algorithms for Composite Bi-Level Optimization

TL;DR

The paper tackles bi-level optimization where both inner and outer objectives are composite convex. It develops two proximal-gradient–based schemes, IRE-PG and its accelerated variant IRE-APG, to obtain simultaneous convergence rates for the inner and outer functions, showing a tunable trade-off via a regularization sequence . When proximal computations are intractable, a surrogate lifting is introduced, accompanied by a rate-translation analysis that quantifies how surrogate performance maps to the original problem and how acceleration gains may degrade under translation. Empirically, the methods exhibit the predicted ergodic rates and illustrate the interplay between inner/outer convergence across values and surrogate settings. Overall, the work advances understanding of rate-optimal iterative regularization for composite bi-level problems and highlights both the potential and limits of acceleration in this context.

Abstract

This paper investigates iterative methods for solving bi-level optimization problems where both inner and outer functions have a composite structure. We establish novel theoretical results, including the first analysis that provides simultaneous convergence rates for the Iteratively REgularized Proximal Gradient (IRE-PG) method, a variant of Solodov's algorithm. These rates for the inner and outer functions highlight the inherent trade-offs between their respective convergence behaviors. We further extend this analysis to an accelerated version of IRE-PG, proving faster convergence rates under specific settings. Additionally, we propose a new scheme for handling cases where these methods cannot be directly applied to the bi-level problem due to the difficulty of computing the associated proximal operator. This scheme offers surrogate functions to approximate the original problem and a framework to translate convergence rates between the surrogate and original functions. Our results show that the accelerated method's advantage diminishes under this translation.

Paper Structure

This paper contains 14 sections, 18 theorems, 81 equations, 2 figures.

Key Result

Lemma 1

Let $f:D\to \mathbb{R}$ be an $L_f$-smooth function over a convex set $D$. Then for any $x,y\in D$, and $L\geq L_f$

Figures (2)

  • Figure 1: Performance of sequences generated by IRE-PG and IRE-APG for different values of $\beta$
  • Figure 2: Performance of IRE-PG and IRE-APG for different values of $\rho$

Theorems & Definitions (35)

  • Definition 1
  • Lemma 1: Descent Lemma
  • Lemma 2: Second Prox Theorem
  • Lemma 3
  • proof
  • Lemma 4
  • Lemma 5
  • proof
  • Lemma 6
  • proof
  • ...and 25 more