A Lower Bound and a Near-Optimal Algorithm for Bilevel Empirical Risk Minimization
Mathieu Dagréou, Thomas Moreau, Samuel Vaiter, Pierre Ablin
TL;DR
The paper tackles bilevel empirical risk minimization with finite-sum outer and inner objectives by proposing SRBA, a variance-reduced extension of SARAH that jointly updates the outer, inner, and hypergradient directions. It proves that SRBA achieves $O((n+m)^{1/2}\varepsilon^{-1})$ oracle calls to reach an $\varepsilon$-stationary point, and establishes a matching lower bound of $\Omega(m^{1/2}\varepsilon^{-1})$ in a worst-case construction, confirming near-optimality in the common regime where $n$ and $m$ are balanced. The analysis hinges on a recursive estimation of three directions, a controlled hypergradient approximation $D_x(z,v,x)$, and carefully designed descent lemmas with a Lyapunov function. Empirical results demonstrate SRBA’s fast convergence and strong final performance relative to state-of-the-art bilevel solvers on synthetic and ML-tuned tasks, including hyperparameter selection and datacleaning scenarios.
Abstract
Bilevel optimization problems, which are problems where two optimization problems are nested, have more and more applications in machine learning. In many practical cases, the upper and the lower objectives correspond to empirical risk minimization problems and therefore have a sum structure. In this context, we propose a bilevel extension of the celebrated SARAH algorithm. We demonstrate that the algorithm requires $\mathcal{O}((n+m)^{\frac12}\varepsilon^{-1})$ oracle calls to achieve $\varepsilon$-stationarity with $n+m$ the total number of samples, which improves over all previous bilevel algorithms. Moreover, we provide a lower bound on the number of oracle calls required to get an approximate stationary point of the objective function of the bilevel problem. This lower bound is attained by our algorithm, making it optimal in terms of sample complexity.
