Improving Hyperparameter Optimization with Checkpointed Model Weights
Nikhil Mehta, Jonathan Lorraine, Steve Masson, Ramanathan Arunachalam, Zaid Pervaiz Bhat, James Lucas, Arun George Zachariah
TL;DR
Forecasting Model Search (FMS) addresses the high expense of hyperparameter optimization by conditioning a Gaussian process surrogate on both hyperparameters and logged checkpoint weights via a permutation-invariant graph metanetwork (PIGMN). Built atop Dynamic Multifidelity HPO (DyHPO), FMS enhances prediction accuracy by encoding architecture- and training-dynamics information from weight checkpoints $\oldsymbol{W}$ into the surrogate, guiding budgeted evaluations for model selection from hubs and subsequent fine-tuning. Empirical results across multiple model hubs and datasets show that FMS-GMN achieves higher ranking quality (Kendall's $\tau$) and lower regret across compute budgets, with demonstrated transfer to unseen architectures and datasets. The approach is implemented with open-source code, enabling broader adoption and future extension to scalable surrogates and richer metadata integration.
Abstract
When training deep learning models, the performance depends largely on the selected hyperparameters. However, hyperparameter optimization (HPO) is often one of the most expensive parts of model design. Classical HPO methods treat this as a black-box optimization problem. However, gray-box HPO methods, which incorporate more information about the setup, have emerged as a promising direction for more efficient optimization. For example, using intermediate loss evaluations to terminate bad selections. In this work, we propose an HPO method for neural networks using logged checkpoints of the trained weights to guide future hyperparameter selections. Our method, Forecasting Model Search (FMS), embeds weights into a Gaussian process deep kernel surrogate model, using a permutation-invariant graph metanetwork to be data-efficient with the logged network weights. To facilitate reproducibility and further research, we open-source our code at https://github.com/NVlabs/forecasting-model-search.
