Conditional Gradient Methods with Standard LMO for Stochastic Simple Bilevel Optimization
Khanh-Hung Giang-Tran, Soroosh Shafiee, Nam Ho-Nguyen
TL;DR
This work tackles stochastic simple bilevel optimization with a convex inner problem by introducing projection-free, iteratively regularized conditional gradient methods that rely only on linear optimization over the base set $X$. Using STORM in the one-sample setting and SPIDER in the finite-sum setting, the IR-SCG and IR-FSCG algorithms employ a vanishing regularization sequence to balance the outer objective with the inner-bilevel constraint, achieving nonasymptotic convergence rates for convex outer objectives and meaningful stationary guarantees for nonconvex outer objectives. Theoretical results establish high-probability rates $O(t^{-1/4})$ (outer and inner) in the one-sample convex case, $O(t^{-1/2})$ in the finite-sum case, and $O(t^{-1/7})$ (one-sample nonconvex) or $O(t^{-1/4})$ (finite-sum nonconvex) in the respective settings, all without relying on halfspace intersections and with anytime guarantees. Empirical results on over-parameterized regression and dictionary learning validate the practicality and scalability of the proposed methods, showing substantial improvements over both projection-based and previous projection-free approaches, consistent with the theoretical rates and conclusions.
Abstract
We propose efficient methods for solving stochastic simple bilevel optimization problems with convex inner levels, where the goal is to minimize an outer stochastic objective function subject to the solution set of an inner stochastic optimization problem. Existing methods often rely on costly projection or linear optimization oracles over complex sets, limiting their scalability. To overcome this, we propose an iteratively regularized conditional gradient approach that leverages linear optimization oracles exclusively over the base feasible set. Our proposed methods employ a vanishing regularization sequence that progressively emphasizes the inner problem while biasing towards desirable minimal outer objective solutions. In the one-sample stochastic setting and under standard convexity assumptions, we establish non-asymptotic convergence rates of $O(t^{-1/4})$ for both the outer and inner objectives. In the finite-sum setting with a mini-batch scheme, the corresponding rates become $O(t^{-1/2})$. When the outer objective is nonconvex, we prove non-asymptotic convergence rates of $O(t^{-1/7})$ for both the outer and inner objectives in the one-sample stochastic setting, and $O(t^{-1/4})$ in the finite-sum setting. Experimental results on over-parametrized regression and dictionary learning tasks demonstrate the practical advantages of our approach over existing methods, confirming our theoretical findings.
