Table of Contents
Fetching ...

Hyperparameter Selection in Continual Learning

Thomas L. Lee, Sigrid Passano Hellan, Linus Ericsson, Elliot J. Crowley, Amos Storkey

TL;DR

This work tackles the challenge of hyperparameter selection in continual learning, where data streams cannot be accessed all at once. It benchmarks a range of realistic HPO frameworks—spanning static and dynamic approaches—across standard CL methods and common benchmarks. The key finding is that no single HPO framework consistently outperforms the others on split-task or heterogeneous task benchmarks, with first-task HPO sometimes matching or slightly exceeding others while remaining computationally efficient. The authors argue for evaluating HPO in CL on more realistic data streams and emphasize compute-aware framework choice, since default hyperparameters often underperform and no single framework dominates performance. Overall, the paper sets a baseline for realistic HPO evaluation in CL and highlights the need for more representative benchmarks to drive progress.

Abstract

In continual learning (CL) -- where a learner trains on a stream of data -- standard hyperparameter optimisation (HPO) cannot be applied, as a learner does not have access to all of the data at the same time. This has prompted the development of CL-specific HPO frameworks. The most popular way to tune hyperparameters in CL is to repeatedly train over the whole data stream with different hyperparameter settings. However, this end-of-training HPO is unusable in practice since a learner can only see the stream once. Hence, there is an open question: what HPO framework should a practitioner use for a CL problem in reality? This paper looks at this question by comparing several realistic HPO frameworks. We find that none of the HPO frameworks considered, including end-of-training HPO, perform consistently better than the rest on popular CL benchmarks. We therefore arrive at a twofold conclusion: a) to be able to discriminate between HPO frameworks there is a need to move beyond the current most commonly used CL benchmarks, and b) on the popular CL benchmarks examined, a CL practitioner should use a realistic HPO framework and can select it based on factors separate from performance, for example compute efficiency.

Hyperparameter Selection in Continual Learning

TL;DR

This work tackles the challenge of hyperparameter selection in continual learning, where data streams cannot be accessed all at once. It benchmarks a range of realistic HPO frameworks—spanning static and dynamic approaches—across standard CL methods and common benchmarks. The key finding is that no single HPO framework consistently outperforms the others on split-task or heterogeneous task benchmarks, with first-task HPO sometimes matching or slightly exceeding others while remaining computationally efficient. The authors argue for evaluating HPO in CL on more realistic data streams and emphasize compute-aware framework choice, since default hyperparameters often underperform and no single framework dominates performance. Overall, the paper sets a baseline for realistic HPO evaluation in CL and highlights the need for more representative benchmarks to drive progress.

Abstract

In continual learning (CL) -- where a learner trains on a stream of data -- standard hyperparameter optimisation (HPO) cannot be applied, as a learner does not have access to all of the data at the same time. This has prompted the development of CL-specific HPO frameworks. The most popular way to tune hyperparameters in CL is to repeatedly train over the whole data stream with different hyperparameter settings. However, this end-of-training HPO is unusable in practice since a learner can only see the stream once. Hence, there is an open question: what HPO framework should a practitioner use for a CL problem in reality? This paper looks at this question by comparing several realistic HPO frameworks. We find that none of the HPO frameworks considered, including end-of-training HPO, perform consistently better than the rest on popular CL benchmarks. We therefore arrive at a twofold conclusion: a) to be able to discriminate between HPO frameworks there is a need to move beyond the current most commonly used CL benchmarks, and b) on the popular CL benchmarks examined, a CL practitioner should use a realistic HPO framework and can select it based on factors separate from performance, for example compute efficiency.
Paper Structure (9 sections, 3 figures, 7 tables)

This paper contains 9 sections, 3 figures, 7 tables.

Figures (3)

  • Figure 1: Depiction of the static end-of-training and first-task HPO frameworks, which fix the hyperparameters (HPs) throughout training. End-of-training HPO is the most common HPO framework for CL and works by training over the whole data stream for each HP configuration and then uses a validation set consisting of data from each task to select the best HPs. End-of-training HPO is unrealistic as it assumes you have access to all of the data stream from the start of training. On the other hand, first-task HPO selects HPs by repeatedly training and validating performance on the first task, which can be used in the real world and is more efficient.
  • Figure 2: Depiction of current-task, seen-tasks (Mem) and seen-tasks (Val) HPO frameworks, which dynamically adapt hyperparameters (HPs) for each task. Each method splits the data of the current task into train and validation sets. Then, current-task HPO uses this validation set to fit the HPs for the current task. In contrast, seen-tasks (Mem) and seen-tasks (Val) use a combination of this validation set and either a sample of data from previous tasks stored in memory or validation sets of previous tasks, respectively. Then current-task and seen-tasks (Mem) HPO retrain on the combined validation and train sets to complete the learning process on that task. Seen-tasks (Val) does not retrain, instead it takes the model fitted using the best found hyperparameters as the final model for the current task. This is to ensure that the current task's validation set has not been trained on when fitting hyperparameters for future tasks.
  • Figure 3: Histograms of the validation accuracy at the end of training for each hyperparameter setting searched over for DER++. We look at standard CL benchmarks and heterogeneous task benchmarks, which are identified by having a 'Hetero' in their name. The histograms show that different hyperparameter settings give a varying range of performances and only a few achieve near to the top performance.