Table of Contents
Fetching ...

A Doubly Stochastically Perturbed Algorithm for Linearly Constrained Bilevel Optimization

Prashant Khanduri, Ioannis Tsaknakis, Yihua Zhang, Sijia Liu, Mingyi Hong

TL;DR

The paper tackles stochastic bilevel optimization where the LL problem is strongly convex and linearly constrained, a setting where the LL solution map can be non-differentiable. It introduces a random perturbation–based smoothing that renders the stochastic implicit objective differentiable and derives closed-form stochastic gradients via the perturbed LL KKT system. Building on this, the authors develop DS-BLO, a doubly stochastic algorithm that uses two perturbations to achieve dimension-free finite-time convergence to an $( ext{ε},ar{oldsymbol{ extdelta}})$-Goldstein stationary point, with a linkage to the original problem under moderate additional assumptions. Empirical results in adversarial training and synthetic BLOs corroborate the method’s efficiency, robustness, and practicality, demonstrating improvements over state-of-the-art approaches. Overall, DS-BLO provides a scalable, theoretically grounded framework for stochastic BLOs with LL constraints without relying on restrictive differentiability or dual-variable access assumptions.

Abstract

In this work, we develop analysis and algorithms for a class of (stochastic) bilevel optimization problems whose lower-level (LL) problem is strongly convex and linearly constrained. Most existing approaches for solving such problems rely on unrealistic assumptions or penalty function-based approximate reformulations that are not necessarily equivalent to the original problem. In this work, we develop a stochastic algorithm based on an implicit gradient approach, suitable for data-intensive applications. It is well-known that for the class of problems of interest, the implicit function is nonsmooth. To circumvent this difficulty, we apply a smoothing technique that involves adding small random (linear) perturbations to the LL objective and then taking the expectation of the implicit objective over these perturbations. This approach gives rise to a novel stochastic formulation that ensures the differentiability of the implicit function and leads to the design of a novel and efficient doubly stochastic algorithm. We show that the proposed algorithm converges to an $(ε, \overlineδ)$-Goldstein stationary point of the stochastic objective in $\widetilde{O}(ε^{-4} \overlineδ^{-1})$ iterations. Moreover, under certain additional assumptions, we establish the same convergence guarantee for the algorithm to achieve a $(3ε, \overlineδ + {O}(ε))$-Goldstein stationary point of the original objective. Finally, we perform experiments on adversarial training (AT) tasks to showcase the utility of the proposed algorithm.

A Doubly Stochastically Perturbed Algorithm for Linearly Constrained Bilevel Optimization

TL;DR

The paper tackles stochastic bilevel optimization where the LL problem is strongly convex and linearly constrained, a setting where the LL solution map can be non-differentiable. It introduces a random perturbation–based smoothing that renders the stochastic implicit objective differentiable and derives closed-form stochastic gradients via the perturbed LL KKT system. Building on this, the authors develop DS-BLO, a doubly stochastic algorithm that uses two perturbations to achieve dimension-free finite-time convergence to an -Goldstein stationary point, with a linkage to the original problem under moderate additional assumptions. Empirical results in adversarial training and synthetic BLOs corroborate the method’s efficiency, robustness, and practicality, demonstrating improvements over state-of-the-art approaches. Overall, DS-BLO provides a scalable, theoretically grounded framework for stochastic BLOs with LL constraints without relying on restrictive differentiability or dual-variable access assumptions.

Abstract

In this work, we develop analysis and algorithms for a class of (stochastic) bilevel optimization problems whose lower-level (LL) problem is strongly convex and linearly constrained. Most existing approaches for solving such problems rely on unrealistic assumptions or penalty function-based approximate reformulations that are not necessarily equivalent to the original problem. In this work, we develop a stochastic algorithm based on an implicit gradient approach, suitable for data-intensive applications. It is well-known that for the class of problems of interest, the implicit function is nonsmooth. To circumvent this difficulty, we apply a smoothing technique that involves adding small random (linear) perturbations to the LL objective and then taking the expectation of the implicit objective over these perturbations. This approach gives rise to a novel stochastic formulation that ensures the differentiability of the implicit function and leads to the design of a novel and efficient doubly stochastic algorithm. We show that the proposed algorithm converges to an -Goldstein stationary point of the stochastic objective in iterations. Moreover, under certain additional assumptions, we establish the same convergence guarantee for the algorithm to achieve a -Goldstein stationary point of the original objective. Finally, we perform experiments on adversarial training (AT) tasks to showcase the utility of the proposed algorithm.

Paper Structure

This paper contains 21 sections, 20 theorems, 132 equations, 1 figure, 3 tables, 1 algorithm.

Key Result

Proposition 2.1

Under Assumptions ass:basics and ass:Fn_UL_LL, we have: where $F(\mathbf{x})$ and $\mkern 1.5mu\overline{\mkern-1.5muF\mkern-1.5mu}\mkern 1.5mu(\mathbf{x})$ are defined in eq: Problem_Bilevel and eq: Stochastic_Problem_Bilevel, respectively.

Figures (1)

  • Figure 1: The evolution of the objective value $F(\mathbf{x})=f(\mathbf{x},\mathbf{y}^{\ast}(\mathbf{x}))$ as a function of time for different values of the dimensions $d_{u}, d_{l}$ and $k$ and three different algorithms.

Theorems & Definitions (43)

  • Proposition 2.1
  • proof
  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Lemma 2.2
  • Proposition 2.3
  • proof
  • Remark 1
  • Lemma 2.4: Approximation bias
  • ...and 33 more