AdaCL:Adaptive Continual Learning

Elif Ceren Gok Yildirim; Murat Onur Yildirim; Mert Kilickaya; Joaquin Vanschoren

AdaCL:Adaptive Continual Learning

Elif Ceren Gok Yildirim, Murat Onur Yildirim, Mert Kilickaya, Joaquin Vanschoren

TL;DR

AdaCL investigates whether hyperparameters in Class-Incremental Learning should adapt per task. It introduces a Bayesian-Optimization-based framework that jointly tunes the learning rate $\eta$, regularization strength $\lambda$, and per-task memory size $m$ based on the current and past tasks, using a validation-guided objective. Across CIFAR-100 and MiniImageNet, AdaCL yields significant accuracy gains and reduced forgetting for several base learners (EWC, LwF, iCaRL, WA) while often reducing memory requirements; the gains are especially pronounced for regularization-based methods. By demonstrating dynamic per-task hyperparameter dynamics and providing an efficient optimization strategy (Tree-structured Parzen Estimators via Optuna), the work highlights the practical impact of adaptive hyperparameters for continual learning systems.

Abstract

Class-Incremental Learning aims to update a deep classifier to learn new categories while maintaining or improving its accuracy on previously observed classes. Common methods to prevent forgetting previously learned classes include regularizing the neural network updates and storing exemplars in memory, which come with hyperparameters such as the learning rate, regularization strength, or the number of exemplars. However, these hyperparameters are usually only tuned at the start and then kept fixed throughout the learning sessions, ignoring the fact that newly encountered tasks may have varying levels of novelty or difficulty. This study investigates the necessity of hyperparameter `adaptivity' in Class-Incremental Learning: the ability to dynamically adjust hyperparameters such as the learning rate, regularization strength, and memory size according to the properties of the new task at hand. We propose AdaCL, a Bayesian Optimization-based approach to automatically and efficiently determine the optimal values for those parameters with each learning task. We show that adapting hyperpararmeters on each new task leads to improvement in accuracy, forgetting and memory. Code is available at https://github.com/ElifCerenGokYildirim/AdaCL.

AdaCL:Adaptive Continual Learning

TL;DR

AdaCL investigates whether hyperparameters in Class-Incremental Learning should adapt per task. It introduces a Bayesian-Optimization-based framework that jointly tunes the learning rate

, regularization strength

, and per-task memory size

based on the current and past tasks, using a validation-guided objective. Across CIFAR-100 and MiniImageNet, AdaCL yields significant accuracy gains and reduced forgetting for several base learners (EWC, LwF, iCaRL, WA) while often reducing memory requirements; the gains are especially pronounced for regularization-based methods. By demonstrating dynamic per-task hyperparameter dynamics and providing an efficient optimization strategy (Tree-structured Parzen Estimators via Optuna), the work highlights the practical impact of adaptive hyperparameters for continual learning systems.

Abstract

Paper Structure (21 sections, 5 equations, 7 figures, 5 tables, 1 algorithm)

This paper contains 21 sections, 5 equations, 7 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Method
Base Models for AdaCL
Constancy Assumption in Class Incremental Learning
AdaCL: Adaptive Continual Learning
Bayesian Optimization via Parzen Estimator
Experimental Protocol
Metrics.
Baselines.
Implementation Details.
Experimental Results
The Effect of Adaptivity.
Comparison with Recent Baselines.
Memory Allocation.
...and 6 more sections

Figures (7)

Figure 1: Comparison of fixed vs. adaptive continual learning (AdaCL). In this work, we hypothesize that different tasks may require different settings and explore the potential of tuning learning rate ($\eta$), regularization strength ($\lambda$) and memory size per task ($m$), allowing to learn adaptively.
Figure 2: Accuracy after each task on CIFAR100. AdaCL significantly boosts the performance on regularization-based methods and improves the efficiency by storing fewer exemplars on memory-based methods while yielding on par performance.
Figure 3: Accuracy after each task on MiniImageNet. The results align with the observations on CIFAR100.
Figure 4: Adaptive modifications in regularization strength, learning rate, and memory allocation. The selected hyperparameters diversely change across task sequences, datasets, and methods and indicate the necessity of adaptivity in CL.
Figure 5: t-SNE plots of selected exemplars. Ada-WA selects exemplars from boundaries and center. This way, it is able to achieve on-par performance with less memory. The final task is omitted from the visualization, since memory selection is not necessary for it.
...and 2 more figures

AdaCL:Adaptive Continual Learning

TL;DR

Abstract

AdaCL:Adaptive Continual Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)