Hyperparameter Selection in Continual Learning
Thomas L. Lee, Sigrid Passano Hellan, Linus Ericsson, Elliot J. Crowley, Amos Storkey
TL;DR
This work tackles the challenge of hyperparameter selection in continual learning, where data streams cannot be accessed all at once. It benchmarks a range of realistic HPO frameworks—spanning static and dynamic approaches—across standard CL methods and common benchmarks. The key finding is that no single HPO framework consistently outperforms the others on split-task or heterogeneous task benchmarks, with first-task HPO sometimes matching or slightly exceeding others while remaining computationally efficient. The authors argue for evaluating HPO in CL on more realistic data streams and emphasize compute-aware framework choice, since default hyperparameters often underperform and no single framework dominates performance. Overall, the paper sets a baseline for realistic HPO evaluation in CL and highlights the need for more representative benchmarks to drive progress.
Abstract
In continual learning (CL) -- where a learner trains on a stream of data -- standard hyperparameter optimisation (HPO) cannot be applied, as a learner does not have access to all of the data at the same time. This has prompted the development of CL-specific HPO frameworks. The most popular way to tune hyperparameters in CL is to repeatedly train over the whole data stream with different hyperparameter settings. However, this end-of-training HPO is unusable in practice since a learner can only see the stream once. Hence, there is an open question: what HPO framework should a practitioner use for a CL problem in reality? This paper looks at this question by comparing several realistic HPO frameworks. We find that none of the HPO frameworks considered, including end-of-training HPO, perform consistently better than the rest on popular CL benchmarks. We therefore arrive at a twofold conclusion: a) to be able to discriminate between HPO frameworks there is a need to move beyond the current most commonly used CL benchmarks, and b) on the popular CL benchmarks examined, a CL practitioner should use a realistic HPO framework and can select it based on factors separate from performance, for example compute efficiency.
