Reducing Contextual Stochastic Bilevel Optimization via Structured Function Approximation
Maxime Bouscary, Jiawei Zhang, Saurabh Amin
TL;DR
This work addresses Contextual Stochastic Bilevel Optimization (CSBO), where the lower-level problem depends on context and is typically intractable due to the need for conditional sampling and solving many inner problems. It proposes a reduction that parameterizes the context-dependent lower-level solution $y^\star(x,\xi)$ with an expressive basis $\Phi$, yielding a standard SBO that can be solved with joint samples from the joint distribution $\mathbb{P}_{(\xi,\eta)}$. Theoretical results show that, when the basis is sufficiently expressive and well-conditioned, the hypergradient of the reduced problem closely approximates the true hypergradient, and an $\epsilon$-stationary CSBO solution can be obtained with $\tilde{O}(\epsilon^{-3})$ complexity; Chebyshev polynomials are shown to satisfy the required conditions, enabling near-optimal rates in a broad class of problems. Empirical tests on inverse optimization and hyperparameter optimization demonstrate faster convergence, improved sample efficiency, and lower memory usage compared to partition-based CSBO baselines, validating the practical impact of the approach.
Abstract
Contextual Stochastic Bilevel Optimization (CSBO) extends standard stochastic bilevel optimization (SBO) by incorporating context-dependent lower-level problems. CSBO problems are generally intractable since existing methods require solving a distinct lower-level problem for each sampled context, resulting in prohibitive sample and computational complexity, in addition to relying on impractical conditional sampling oracles. We propose a reduction framework that approximates the lower-level solutions using expressive basis functions, thereby decoupling the lower-level dependence on context and transforming CSBO into a standard SBO problem solvable using only joint samples from the context and noise distribution. First, we show that this reduction preserves hypergradient accuracy and yields an $ε$-stationary solution to CSBO. Then, we relate the sample complexity of the reduced problem to simple metrics of the basis. This establishes sufficient criteria for a basis to yield $ε$-stationary solutions with a near-optimal complexity of $\widetilde{O}(ε^{-3})$, matching the best-known rate for standard SBO up to logarithmic factors. Moreover, we show that Chebyshev polynomials provide a concrete and efficient choice of basis that satisfies these criteria for a broad class of problems. Empirical results on inverse and hyperparameter optimization demonstrate that our approach outperforms CSBO baselines in convergence, sample efficiency, and memory usage.
