Functional Bilevel Optimization for Machine Learning
Ieva Petrulionyte, Julien Mairal, Michael Arbel
TL;DR
This work reframes bilevel optimization in machine learning as a functional problem, minimizing the inner objective over a function space $\mathcal{H}$ instead of over neural network parameters. By leveraging functional implicit differentiation in $L_2$ spaces, it derives a stable total gradient formula $\nabla\mathcal{F}(\omega)=g_\omega + B_\omega a_\omega^\star$ and introduces FuncID, a scalable algorithm that learns both the inner prediction and adjoint functions with neural nets. The paper proves differentiability and convergence guarantees under mild assumptions and demonstrates practical benefits on instrumental-variable regression (2SLS) and model-based reinforcement learning (CartPole), showing improved stability, faster convergence, and competitive or superior performance versus parametric bilevel methods. The functional view enables using deep nets as inner predictors while mitigating the ill-posedness and ambiguity that arise from multiple inner solutions, with open avenues to extend to RKHS, non-smooth objectives, and broader ML tasks.
Abstract
In this paper, we introduce a new functional point of view on bilevel optimization problems for machine learning, where the inner objective is minimized over a function space. These types of problems are most often solved by using methods developed in the parametric setting, where the inner objective is strongly convex with respect to the parameters of the prediction function. The functional point of view does not rely on this assumption and notably allows using over-parameterized neural networks as the inner prediction function. We propose scalable and efficient algorithms for the functional bilevel optimization problem and illustrate the benefits of our approach on instrumental regression and reinforcement learning tasks.
