Table of Contents
Fetching ...

A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental Learning

Da-Wei Zhou, Qi-Wei Wang, Han-Jia Ye, De-Chuan Zhan

TL;DR

This work tackles memory-efficient class-incremental learning by highlighting the need to count model buffers in the memory budget for fair comparisons. It analyzes how network layers contribute differently to continual learning and introduces MEMO, a memory-efficient Expandable MODEL that shares generalized blocks while adding specialized blocks for new tasks. Through extensive experiments across CIFAR100 and ImageNet subsets, MEMO demonstrates competitive performance under varied memory constraints and proposes holistic evaluation metrics (AUC, APM) to assess methods fairly. The study provides practical guidance for memory budgeting in CIL and presents MEMO as a robust, scalable approach for real-world continual learning systems.

Abstract

Real-world applications require the classification model to adapt to new classes without forgetting old ones. Correspondingly, Class-Incremental Learning (CIL) aims to train a model with limited memory size to meet this requirement. Typical CIL methods tend to save representative exemplars from former classes to resist forgetting, while recent works find that storing models from history can substantially boost the performance. However, the stored models are not counted into the memory budget, which implicitly results in unfair comparisons. We find that when counting the model size into the total budget and comparing methods with aligned memory size, saving models do not consistently work, especially for the case with limited memory budgets. As a result, we need to holistically evaluate different CIL methods at different memory scales and simultaneously consider accuracy and memory size for measurement. On the other hand, we dive deeply into the construction of the memory buffer for memory efficiency. By analyzing the effect of different layers in the network, we find that shallow and deep layers have different characteristics in CIL. Motivated by this, we propose a simple yet effective baseline, denoted as MEMO for Memory-efficient Expandable MOdel. MEMO extends specialized layers based on the shared generalized representations, efficiently extracting diverse representations with modest cost and maintaining representative exemplars. Extensive experiments on benchmark datasets validate MEMO's competitive performance. Code is available at: https://github.com/wangkiw/ICLR23-MEMO

A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental Learning

TL;DR

This work tackles memory-efficient class-incremental learning by highlighting the need to count model buffers in the memory budget for fair comparisons. It analyzes how network layers contribute differently to continual learning and introduces MEMO, a memory-efficient Expandable MODEL that shares generalized blocks while adding specialized blocks for new tasks. Through extensive experiments across CIFAR100 and ImageNet subsets, MEMO demonstrates competitive performance under varied memory constraints and proposes holistic evaluation metrics (AUC, APM) to assess methods fairly. The study provides practical guidance for memory budgeting in CIL and presents MEMO as a robust, scalable approach for real-world continual learning systems.

Abstract

Real-world applications require the classification model to adapt to new classes without forgetting old ones. Correspondingly, Class-Incremental Learning (CIL) aims to train a model with limited memory size to meet this requirement. Typical CIL methods tend to save representative exemplars from former classes to resist forgetting, while recent works find that storing models from history can substantially boost the performance. However, the stored models are not counted into the memory budget, which implicitly results in unfair comparisons. We find that when counting the model size into the total budget and comparing methods with aligned memory size, saving models do not consistently work, especially for the case with limited memory budgets. As a result, we need to holistically evaluate different CIL methods at different memory scales and simultaneously consider accuracy and memory size for measurement. On the other hand, we dive deeply into the construction of the memory buffer for memory efficiency. By analyzing the effect of different layers in the network, we find that shallow and deep layers have different characteristics in CIL. Motivated by this, we propose a simple yet effective baseline, denoted as MEMO for Memory-efficient Expandable MOdel. MEMO extends specialized layers based on the shared generalized representations, efficiently extracting diverse representations with modest cost and maintaining representative exemplars. Extensive experiments on benchmark datasets validate MEMO's competitive performance. Code is available at: https://github.com/wangkiw/ICLR23-MEMO
Paper Structure (39 sections, 4 equations, 31 figures, 25 tables)

This paper contains 39 sections, 4 equations, 31 figures, 25 tables.

Figures (31)

  • Figure 1: The average accuracy of different methods by varying memory size from small to large. The start point corresponds to the memory size of exemplar-based methods with benchmark backbone (WA zhao2020maintaining, iCaRL rebuffi2017icarl, Replay chaudhry2019continual), and the endpoint corresponds to the memory cost of model-based methods with benchmark backbone (DER yan2021dynamically and Memo (our proposed method)). We align the memory cost by using the small model for model-based methods or adding exemplars for exemplar-based methods. 'Base' stands for the number of classes in the first task, and 'Inc' represents the number of classes in each incremental new task. See Section \ref{['sec:setup']} and \ref{['sec:comparison']} for more details.
  • Figure 2: Performance of different methods when fixing the total budget and varying the ratio of model size to total memory size on CIFAR100.
  • Figure 3: Left: gradient norm of different residual blocks when optimizing Eq. \ref{['eq:icarl']}. Deeper layers have larger gradients, while shallow layers have small gradients. Middle: Shift between the first and last epoch of different residual blocks. Deeper layers change more, while shallow layers change less. Right: feature similarity (CKA) of different backbones learned by Eq. \ref{['eq:der']}. The lower triangular matrix denotes the similarity between deeper layers; the upper triangular matrix denotes the similarity between shallow layers.
  • Figure 4: An overview of three typical methods. Left: Exemplar-based methods train a single model. Middle: Model-based methods train a new model per new task. Right: Memo trains a new specialized block per new task. When aligning the memory cost of these methods, exemplar-based methods can save the most exemplars, while model-based methods have the least. Memo strikes a trade-off between exemplar and model buffer.
  • Figure 5: Experiments about specialized and generalized blocks. Specialized blocks should be fixed; while fixing or not generalized blocks depends on the number of classes in the base stage. Block with $\bar{\phi}$ means frozen, while without a bar means trainable.
  • ...and 26 more figures