Improving Continual Learning Performance and Efficiency with Auxiliary Classifiers

Filip Szatkowski; Yaoyue Zheng; Fei Yang; Bartłomiej Twardowski; Tomasz Trzciński; Joost van de Weijer

Improving Continual Learning Performance and Efficiency with Auxiliary Classifiers

Filip Szatkowski, Yaoyue Zheng, Fei Yang, Bartłomiej Twardowski, Tomasz Trzciński, Joost van de Weijer

TL;DR

Catastrophic forgetting in continual learning is mitigated by introducing auxiliary classifiers (ACs) attached to intermediate layers, leveraging the stability of early representations. ACs enable dynamic, early-exit inference and can be trained alongside standard CL models to boost accuracy while reducing computation, achieving about a 10% relative improvement on CIFAR100 and ImageNet100 and 10-60% inference cost reductions. The approach scales across architectures (ResNet, VGG, ViT) and remains beneficial for various CL methods, including exemplar-based and regularization-based strategies. This work offers a practical, scalable means to enhance continual learning performance and efficiency in resource-constrained settings, backed by extensive experiments and reproducible code.

Abstract

Continual learning is crucial for applying machine learning in challenging, dynamic, and often resource-constrained environments. However, catastrophic forgetting - overwriting previously learned knowledge when new information is acquired - remains a major challenge. In this work, we examine the intermediate representations in neural network layers during continual learning and find that such representations are less prone to forgetting, highlighting their potential to accelerate computation. Motivated by these findings, we propose to use auxiliary classifiers(ACs) to enhance performance and demonstrate that integrating ACs into various continual learning methods consistently improves accuracy across diverse evaluation settings, yielding an average 10% relative gain. We also leverage the ACs to reduce the average cost of the inference by 10-60% without compromising accuracy, enabling the model to return the predictions before computing all the layers. Our approach provides a scalable and efficient solution for continual learning.

Improving Continual Learning Performance and Efficiency with Auxiliary Classifiers

TL;DR

Abstract

Paper Structure (50 sections, 1 equation, 45 figures, 13 tables)

This paper contains 50 sections, 1 equation, 45 figures, 13 tables.

Introduction
Related works
Intermediate representations in CL
Intermediate representations are more stable
Early layer classifiers perform better on old data
Continually trained networks overthink more
End-to-end training improves ACs
Enhancing CL through ACs
Combining from multiple classifier predictions
Dynamic inference with ACs
AC-enhanced CL methods
Experimental results
Improved CL with ACs
Standard CL benchmarks.
Leveraging ACs for dynamic inference.
...and 35 more sections

Figures (45)

Figure 1: We integrate auxiliary classifiers (ACs) into various CL methods, enabling dynamic inference and reducing the inference cost. We measure their accuracy relative to the standard network at different computational budgets and show that AC-enhanced methods match the performance of the standard counterparts at only 50-80% cost, and improve their performance at higher computational budgets. The accuracy of AC-enhanced models saturates at 80-90% computation, allowing us to save 10-20% of the inference cost without sacrificing accuracy.
Figure 2: CKA of the first task representations across different ResNet32 layers (L1.B3-L3.B5) through continual learning on CIFAR100 split into 10 tasks. Representations at the early layers are more similar across the continual learning, hinting at the potential for more stability that could be leveraged to improve the performance by incorporating auxiliary classifiers at the intermediate layers.
Figure 3: Per-task difference (only positives) in accuracy between the auxiliary classifiers (ACs), trained with linear probing on intermediate layers, and the final classifier. Surprisingly, in continual learning, some intermediate classifiers can significantly outperform the final classifier on the old task data, especially for exemplar-free methods (FT and LwF).
Figure 4: Overthinking and AC performance analysis on CIFAR100x10. Overthinking refers to a case where samples correctly classified by early classifiers are misclassified later by the final classifier. (a) Overthinking is much more prominent in continual learning methods than in standard joint training, which indicates that the accuracy of continual learning could be greatly improved through ACs. (b) Each classifier correctly classifies a significant portion of the samples misclassified by the final classifier. (c) Subsets of samples can be correctly classified only by a single given AC. (d) Training ACs together with final networks improves the performance of most classifiers.
Figure 5: Overview of the network enhanced with ACs in continual learning. The early layers exhibit less forgetting on the old tasks, and can return the correct prediction (✓) in cases where the final classifier fails (✗), and save computations.
...and 40 more figures

Improving Continual Learning Performance and Efficiency with Auxiliary Classifiers

TL;DR

Abstract

Improving Continual Learning Performance and Efficiency with Auxiliary Classifiers

Authors

TL;DR

Abstract

Table of Contents

Figures (45)