A Doubly Stochastically Perturbed Algorithm for Linearly Constrained Bilevel Optimization
Prashant Khanduri, Ioannis Tsaknakis, Yihua Zhang, Sijia Liu, Mingyi Hong
TL;DR
The paper tackles stochastic bilevel optimization where the LL problem is strongly convex and linearly constrained, a setting where the LL solution map can be non-differentiable. It introduces a random perturbation–based smoothing that renders the stochastic implicit objective differentiable and derives closed-form stochastic gradients via the perturbed LL KKT system. Building on this, the authors develop DS-BLO, a doubly stochastic algorithm that uses two perturbations to achieve dimension-free finite-time convergence to an $( ext{ε},ar{oldsymbol{ extdelta}})$-Goldstein stationary point, with a linkage to the original problem under moderate additional assumptions. Empirical results in adversarial training and synthetic BLOs corroborate the method’s efficiency, robustness, and practicality, demonstrating improvements over state-of-the-art approaches. Overall, DS-BLO provides a scalable, theoretically grounded framework for stochastic BLOs with LL constraints without relying on restrictive differentiability or dual-variable access assumptions.
Abstract
In this work, we develop analysis and algorithms for a class of (stochastic) bilevel optimization problems whose lower-level (LL) problem is strongly convex and linearly constrained. Most existing approaches for solving such problems rely on unrealistic assumptions or penalty function-based approximate reformulations that are not necessarily equivalent to the original problem. In this work, we develop a stochastic algorithm based on an implicit gradient approach, suitable for data-intensive applications. It is well-known that for the class of problems of interest, the implicit function is nonsmooth. To circumvent this difficulty, we apply a smoothing technique that involves adding small random (linear) perturbations to the LL objective and then taking the expectation of the implicit objective over these perturbations. This approach gives rise to a novel stochastic formulation that ensures the differentiability of the implicit function and leads to the design of a novel and efficient doubly stochastic algorithm. We show that the proposed algorithm converges to an $(ε, \overlineδ)$-Goldstein stationary point of the stochastic objective in $\widetilde{O}(ε^{-4} \overlineδ^{-1})$ iterations. Moreover, under certain additional assumptions, we establish the same convergence guarantee for the algorithm to achieve a $(3ε, \overlineδ + {O}(ε))$-Goldstein stationary point of the original objective. Finally, we perform experiments on adversarial training (AT) tasks to showcase the utility of the proposed algorithm.
