Glocal Hypergradient Estimation with Koopman Operator
Ryuichiro Hataya, Yoshinobu Kawahara
TL;DR
This paper tackles the inefficiency of gradient-based hyperparameter optimization by proposing glocal hypergradient estimation, which leverages the Koopman operator to infer global hypergradients from a trajectory of local hypergradients. By approximating the global gradient with a finite-dimensional linear model derived from local dynamics, the method enables greedy hyperparameter updates that combine the reliability of global optimization with the speed of local updates. Theoretical analysis provides complexity bounds and an error guarantee relative to the true global gradient, while experiments on optimizer hyperparameters and data reweighting show performance close to global methods but with substantial efficiency gains. The approach offers a scalable framework for bi-level optimization in deep learning, with potential extensions to stochastic and more complex meta-learning settings.
Abstract
Gradient-based hyperparameter optimization methods update hyperparameters using hypergradients, gradients of a meta criterion with respect to hyperparameters. Previous research used two distinct update strategies: optimizing hyperparameters using global hypergradients obtained after completing model training or local hypergradients derived after every few model updates. While global hypergradients offer reliability, their computational cost is significant; conversely, local hypergradients provide speed but are often suboptimal. In this paper, we propose *glocal* hypergradient estimation, blending "global" quality with "local" efficiency. To this end, we use the Koopman operator theory to linearize the dynamics of hypergradients so that the global hypergradients can be efficiently approximated only by using a trajectory of local hypergradients. Consequently, we can optimize hyperparameters greedily using estimated global hypergradients, achieving both reliability and efficiency simultaneously. Through numerical experiments of hyperparameter optimization, including optimization of optimizers, we demonstrate the effectiveness of the glocal hypergradient estimation.
