Table of Contents
Fetching ...

Iterated Population Based Training with Task-Agnostic Restarts

Alexander Chebykin, Tanja Alderliesten, Peter A. N. Bosman

TL;DR

The paper tackles the challenge of efficient hyperparameter optimization for deep learning under tight budgets by introducing IPBT, which automatically adjusts the HP-update step size through restarts that reuse partially trained weights in a task-agnostic manner and reinitialize HPs via time-varying Bayesian optimization. This iterated PBT approach enables rapid early gains and eventual long-horizon optimization while preserving efficiency, making it suitable for black-box problems with limited budget. Across 8 image-classification and reinforcement-learning tasks, IPBT matches or surpasses prior PBT variants and popular HPO baselines without task-specific tuning, with ablations confirming the effectiveness of stagnation-based restarts, shrink-perturb weight reuse, and exponential step-size growth. The work demonstrates a robust, out-of-the-box method for discovering useful HP schedules and highlights directions for extending IPBT to multi-objective optimization and NAS in future work.

Abstract

Hyperparameter Optimization (HPO) can lift the burden of tuning hyperparameters (HPs) of neural networks. HPO algorithms from the Population Based Training (PBT) family are efficient thanks to dynamically adjusting HPs every few steps of the weight optimization. Recent results indicate that the number of steps between HP updates is an important meta-HP of all PBT variants that can substantially affect their performance. Yet, no method or intuition is available for efficiently setting its value. We introduce Iterated Population Based Training (IPBT), a novel PBT variant that automatically adjusts this HP via restarts that reuse weight information in a task-agnostic way and leverage time-varying Bayesian optimization to reinitialize HPs. Evaluation on 8 image classification and reinforcement learning tasks shows that, on average, our algorithm matches or outperforms 5 previous PBT variants and other HPO algorithms (random search, ASHA, SMAC3), without requiring a budget increase or any changes to its HPs. The source code is available at https://github.com/AwesomeLemon/IPBT.

Iterated Population Based Training with Task-Agnostic Restarts

TL;DR

The paper tackles the challenge of efficient hyperparameter optimization for deep learning under tight budgets by introducing IPBT, which automatically adjusts the HP-update step size through restarts that reuse partially trained weights in a task-agnostic manner and reinitialize HPs via time-varying Bayesian optimization. This iterated PBT approach enables rapid early gains and eventual long-horizon optimization while preserving efficiency, making it suitable for black-box problems with limited budget. Across 8 image-classification and reinforcement-learning tasks, IPBT matches or surpasses prior PBT variants and popular HPO baselines without task-specific tuning, with ablations confirming the effectiveness of stagnation-based restarts, shrink-perturb weight reuse, and exponential step-size growth. The work demonstrates a robust, out-of-the-box method for discovering useful HP schedules and highlights directions for extending IPBT to multi-objective optimization and NAS in future work.

Abstract

Hyperparameter Optimization (HPO) can lift the burden of tuning hyperparameters (HPs) of neural networks. HPO algorithms from the Population Based Training (PBT) family are efficient thanks to dynamically adjusting HPs every few steps of the weight optimization. Recent results indicate that the number of steps between HP updates is an important meta-HP of all PBT variants that can substantially affect their performance. Yet, no method or intuition is available for efficiently setting its value. We introduce Iterated Population Based Training (IPBT), a novel PBT variant that automatically adjusts this HP via restarts that reuse weight information in a task-agnostic way and leverage time-varying Bayesian optimization to reinitialize HPs. Evaluation on 8 image classification and reinforcement learning tasks shows that, on average, our algorithm matches or outperforms 5 previous PBT variants and other HPO algorithms (random search, ASHA, SMAC3), without requiring a budget increase or any changes to its HPs. The source code is available at https://github.com/AwesomeLemon/IPBT.

Paper Structure

This paper contains 31 sections, 10 figures, 4 tables, 1 algorithm.

Figures (10)

  • Figure 1: An example run of IPBT on the Humanoid task. Three iterations can be seen. Within each iteration, HPs are dynamically optimized during training via a PBT-like procedure (Section \ref{['sec:rel:pbt']}). If the performance is determined to be stagnating (Section \ref{['sec:method:when']}), a restart is triggered. Upon restart, information from weights and HPs of previous iterations is reused (Section \ref{['sec:method:reuse']}), and the step size of the PBT-like procedure is increased (Section \ref{['sec:method:adjust_step']}). New HPs are predicted via a time-varying meta BO procedure which is trained on the initial HPs of previous iterations as inputs and their descendants' maximum achieved scores as outputs (in the figure, the descendants of each initial solution share its color). The pseudocode of IPBT is provided in Appendix A.
  • Figure 2: General PBT loop: in each outer step, the weights are trained for $\mathrm{step_{inner}}$ inner steps and the HPs are explored.
  • Figure 3: Visualizations of the smoothed standardized scores when a trajectory is (a) improving vs. (b) stagnant.
  • Figure 4: Normalized performance across 8 tasks (IQM and CI) of IPBT and untuned PBT variants.
  • Figure 5: Normalized performance across 8 tasks (IQM and CI) of IPBT, tuned PBT variants, and baselines.
  • ...and 5 more figures