Table of Contents
Fetching ...

Optimistic Bilevel Optimization with Composite Lower-Level Problem

Mattia Solla, Johannes O. Royset

TL;DR

The paper tackles optimistic bilevel optimization with a composite lower-level that is convex but not necessarily strongly convex. It introduces a double regularization using a Moreau envelope and a quadratic term to obtain a globally piecewise smooth primal–dual lower-level solution mapping, enabling a computable Jacobian and a gradient formula for the regularized hyper-objective. Under mild assumptions, it proves that the hyper-objective of the actual problem is well defined and that its gradient can be approximated by the regularized problem's gradient, with convergence guarantees for gradient-sampling–based algorithms to Clarke stationary points of the true problem. The approach is demonstrated on two machine-learning–oriented problems (elastic-net hyperparameter tuning and data poisoning), showing robust performance even when interior-related regularity assumptions fail and highlighting the practical value of the double-regularization and gradient-sampling framework.

Abstract

This paper introduces a novel double regularization scheme for bilevel optimization problems whose lower-level problem is composite and convex, but not necessarily strongly convex, in the lower-level variable. The analysis focuses on the primal-dual solution mapping of the regularized lower-level problem and exploits its properties to derive an almost-everywhere formula for the gradient of the regularized hyper-objective under mild assumptions. The paper then establishes conditions under which the hyper-objective of the actual problem is well defined and shows that its gradient can be approximated by the gradient of the regularized hyper-objective. Building on these results, a gradient sampling-based algorithm computes approximately stationary points of the regularized hyper-objective, and we prove its convergence to stationary points of the actual problem. Two numerical examples from machine learning demonstrate the proposed approach.

Optimistic Bilevel Optimization with Composite Lower-Level Problem

TL;DR

The paper tackles optimistic bilevel optimization with a composite lower-level that is convex but not necessarily strongly convex. It introduces a double regularization using a Moreau envelope and a quadratic term to obtain a globally piecewise smooth primal–dual lower-level solution mapping, enabling a computable Jacobian and a gradient formula for the regularized hyper-objective. Under mild assumptions, it proves that the hyper-objective of the actual problem is well defined and that its gradient can be approximated by the regularized problem's gradient, with convergence guarantees for gradient-sampling–based algorithms to Clarke stationary points of the true problem. The approach is demonstrated on two machine-learning–oriented problems (elastic-net hyperparameter tuning and data poisoning), showing robust performance even when interior-related regularity assumptions fail and highlighting the practical value of the double-regularization and gradient-sampling framework.

Abstract

This paper introduces a novel double regularization scheme for bilevel optimization problems whose lower-level problem is composite and convex, but not necessarily strongly convex, in the lower-level variable. The analysis focuses on the primal-dual solution mapping of the regularized lower-level problem and exploits its properties to derive an almost-everywhere formula for the gradient of the regularized hyper-objective under mild assumptions. The paper then establishes conditions under which the hyper-objective of the actual problem is well defined and shows that its gradient can be approximated by the gradient of the regularized hyper-objective. Building on these results, a gradient sampling-based algorithm computes approximately stationary points of the regularized hyper-objective, and we prove its convergence to stationary points of the actual problem. Two numerical examples from machine learning demonstrate the proposed approach.
Paper Structure (15 sections, 14 theorems, 135 equations, 2 tables, 2 algorithms)

This paper contains 15 sections, 14 theorems, 135 equations, 2 tables, 2 algorithms.

Key Result

Theorem 2.1

(hang2024role). For $C^1$ mapping $\psi:\mathbb{R}^n\to \mathbb{R}^n$ and epi-polyhedral function $f:\mathbb{R}^n\to \overline{\mathbb{R}}$, define the set-valued mapping Suppose that $\bar{x}$ satisfies $-\psi(\bar{x})\in \mathop{\mathrm{ri}}\nolimits(\partial f(\bar{x}))$. Then $\overline{K}=K_f(\bar{x},-\psi(\bar{x}))$ is a linear subspace and $s$ has a Lipschitz continuous single-valued local

Theorems & Definitions (30)

  • Theorem 2.1
  • Proposition 3.1
  • proof
  • Theorem 3.2
  • proof
  • Theorem 3.3
  • proof
  • Theorem 3.4
  • proof
  • Theorem 3.5
  • ...and 20 more