Learning Theory for Kernel Bilevel Optimization
Fares El Khoury, Edouard Pauwels, Samuel Vaiter, Michael Arbel
TL;DR
This work establishes the first finite-sample generalization bounds for Kernel Bilevel Optimization (KBO), where the inner problem is solved in a reproducing kernel Hilbert space and the outer objective is an expectation of a pointwise loss. It derives a gradient representation via functional implicit differentiation in RKHS, introduces practical plug-in estimators for the value function and its gradient, and proves uniform generalization bounds using empirical process theory and degenerate U-processes, yielding rates of order $O\left(\frac{1}{\sqrt{m}}+\frac{1}{\sqrt{n}}\right)$. The authors also show the equivalence of two gradient estimators and provide convergence guarantees for bilevel gradient methods, supported by numerical experiments on synthetic instrumental variable regression. These results inform sample-computation trade-offs in nonparametric bilevel learning and guide kernel-based hyperparameter tuning and related tasks under distribution shift.
Abstract
Bilevel optimization has emerged as a technique for addressing a wide range of machine learning problems that involve an outer objective implicitly determined by the minimizer of an inner problem. While prior works have primarily focused on the parametric setting, a learning-theoretic foundation for bilevel optimization in the nonparametric case remains relatively unexplored. In this paper, we take a first step toward bridging this gap by studying Kernel Bilevel Optimization (KBO), where the inner objective is optimized over a reproducing kernel Hilbert space. This setting enables rich function approximation while providing a foundation for rigorous theoretical analysis. In this context, we derive novel finite-sample generalization bounds for KBO, leveraging tools from empirical process theory. These bounds further allow us to assess the statistical accuracy of gradient-based methods applied to the empirical discretization of KBO. We numerically illustrate our theoretical findings on a synthetic instrumental variable regression task.
