Table of Contents
Fetching ...

iCaRL: Incremental Classifier and Representation Learning

Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, Christoph H. Lampert

TL;DR

iCaRL tackles the problem of class-incremental learning by jointly learning classifiers and representations from a data stream. It introduces three core components: a nearest-mean-of-exemplars classifier, a herding-based exemplar selection strategy, and a distillation-guided representation update with prototype rehearsal. Experiments on CIFAR-100 and ImageNet demonstrate that iCaRL can incrementally learn many classes with strong accuracy, outperforming finetuning, fixed representations, and distillation-only baselines, especially as the class batch size decreases. While still below batch-training performance, iCaRL provides a practical, scalable approach to life-long visual recognition with bounded memory and computation.

Abstract

A major open problem on the road to artificial intelligence is the development of incrementally learning systems that learn about more and more concepts over time from a stream of data. In this work, we introduce a new training strategy, iCaRL, that allows learning in such a class-incremental way: only the training data for a small number of classes has to be present at the same time and new classes can be added progressively. iCaRL learns strong classifiers and a data representation simultaneously. This distinguishes it from earlier works that were fundamentally limited to fixed data representations and therefore incompatible with deep learning architectures. We show by experiments on CIFAR-100 and ImageNet ILSVRC 2012 data that iCaRL can learn many classes incrementally over a long period of time where other strategies quickly fail.

iCaRL: Incremental Classifier and Representation Learning

TL;DR

iCaRL tackles the problem of class-incremental learning by jointly learning classifiers and representations from a data stream. It introduces three core components: a nearest-mean-of-exemplars classifier, a herding-based exemplar selection strategy, and a distillation-guided representation update with prototype rehearsal. Experiments on CIFAR-100 and ImageNet demonstrate that iCaRL can incrementally learn many classes with strong accuracy, outperforming finetuning, fixed representations, and distillation-only baselines, especially as the class batch size decreases. While still below batch-training performance, iCaRL provides a practical, scalable approach to life-long visual recognition with bounded memory and computation.

Abstract

A major open problem on the road to artificial intelligence is the development of incrementally learning systems that learn about more and more concepts over time from a stream of data. In this work, we introduce a new training strategy, iCaRL, that allows learning in such a class-incremental way: only the training data for a small number of classes has to be present at the same time and new classes can be added progressively. iCaRL learns strong classifiers and a data representation simultaneously. This distinguishes it from earlier works that were fundamentally limited to fixed data representations and therefore incompatible with deep learning architectures. We show by experiments on CIFAR-100 and ImageNet ILSVRC 2012 data that iCaRL can learn many classes incrementally over a long period of time where other strategies quickly fail.

Paper Structure

This paper contains 23 sections, 2 equations, 8 figures, 1 table, 5 algorithms.

Figures (8)

  • Figure 1: Class-incremental learning: an algorithm learns continuously from a sequential data stream in which new classes occur. At any time, the learner is able to perform multi-class classification for all classes observed so far.
  • Figure 2: Experimental results of class-incremental training on iCIFAR-100 and iILSVRC: reported are multi-class accuracies across all classes observed up to a certain time point. iCaRL clearly outperforms the other methods in this setting. Fixing the data representation after having trained on the first batch (fixed repr.) performs worse than distillation-based LwF.MC, except for iILSVRC-full. Finetuning the network without preventing catastrophic forgetting (finetuning) achieves the worst results. For comparison, the same network trained with all data available achieves 68.6% multi-class accuracy.
  • Figure 3: Confusion matrices of different method on iCIFAR-100 (with entries transformed by $\log(1+x)$ for better visibility). iCaRL's predictions are distributed close to uniformly over all classes, whereas LwF.MC tends to predict classes from recent batches more frequently. The classifier with fixed representation has a bias towards classes from the first batch, while the network trained by finetuning predicts exclusively classes labels from the last batch.
  • Figure 4: Average incremental accuracy on iCIFAR-100 with 10 classes per batch for different memory budgets $K$.
  • Figure 5: Confusion matrix for iCaRL on iILSVRC-large (1000 classes in batches of 100)
  • ...and 3 more figures