Table of Contents
Fetching ...

Continual Learning on a Data Diet

Elif Ceren Gok Yildirim, Murat Onur Yildirim, Joaquin Vanschoren

TL;DR

This work explores the potential of learning from important samples and investigates the learning-forgetting dynamics by shedding light on the underlying mechanisms driving their improved stability-plasticity balance, and presents several significant observations.

Abstract

Continual Learning (CL) methods usually learn from all available data. However, this is not the case in human cognition which efficiently focuses on key experiences while disregarding the redundant information. Similarly, not all data points in a dataset have equal potential; some can be more informative than others. This disparity may significantly impact the performance, as both the quality and quantity of samples directly influence the model's generalizability and efficiency. Drawing inspiration from this, we explore the potential of learning from important samples and present an empirical study for evaluating coreset selection techniques in the context of CL to stimulate research in this unexplored area. We train different continual learners on increasing amounts of selected samples and investigate the learning-forgetting dynamics by shedding light on the underlying mechanisms driving their improved stability-plasticity balance. We present several significant observations: learning from selectively chosen samples (i) enhances incremental accuracy, (ii) improves knowledge retention of previous tasks, and (iii) refines learned representations. This analysis contributes to a deeper understanding of selective learning strategies in CL scenarios.

Continual Learning on a Data Diet

TL;DR

This work explores the potential of learning from important samples and investigates the learning-forgetting dynamics by shedding light on the underlying mechanisms driving their improved stability-plasticity balance, and presents several significant observations.

Abstract

Continual Learning (CL) methods usually learn from all available data. However, this is not the case in human cognition which efficiently focuses on key experiences while disregarding the redundant information. Similarly, not all data points in a dataset have equal potential; some can be more informative than others. This disparity may significantly impact the performance, as both the quality and quantity of samples directly influence the model's generalizability and efficiency. Drawing inspiration from this, we explore the potential of learning from important samples and present an empirical study for evaluating coreset selection techniques in the context of CL to stimulate research in this unexplored area. We train different continual learners on increasing amounts of selected samples and investigate the learning-forgetting dynamics by shedding light on the underlying mechanisms driving their improved stability-plasticity balance. We present several significant observations: learning from selectively chosen samples (i) enhances incremental accuracy, (ii) improves knowledge retention of previous tasks, and (iii) refines learned representations. This analysis contributes to a deeper understanding of selective learning strategies in CL scenarios.

Paper Structure

This paper contains 37 sections, 1 equation, 6 figures, 6 tables, 1 algorithm.

Figures (6)

  • Figure 1: Illustration of our evaluation protocol: Existing class-incremental learning methods (left) typically utilize all available samples indiscriminately during training. In this study (right), we subject class-incremental learners to a data diet and analyze how the selection of the most important samples with different coreset selection methods affects the incremental performance.
  • Figure 2: Accuracy [%] after each learning step on LwF (above), reveals that Random selection demonstrates relatively less forgetting while effectively learning. This is due to the abrupt parameter changes. For example, on the last layer between consecutive tasks (below), Uncertainty and GraphCut abruptly shift the parameters.
  • Figure 3: Accuracy [%] of each task after every learning session on different class-incremental learning methods with Split-CIFAR10. This comparison includes the performance using all samples vs. the best performing coreset selection, which may involve different coreset fractions. The underlying reason for the improved accuracy is attributed to reduced forgetting.
  • Figure 4: Saliency maps from the first encountered task after completing all learning sessions. Models trained with selected coresets exhibit enhanced perception capabilities in capturing the important parts of an input. Note that we select top performing coreset selection methods across different class-incremental learners.
  • Figure 5: DER's representation of all classes on Split-CIFAR10 with varying coresets selected with GraphCut, compared to the full samples. When it is trained with coresets, it exhibits superior ability to distinct representations. DER's representation of all classes with varying coresets selected with GraphCut, compared to the full samples. When it is trained with coresets, it exhibits superior ability to distinct representations.
  • ...and 1 more figures