Table of Contents
Fetching ...

Hybrid Memory Replay: Blending Real and Distilled Data for Class Incremental Learning

Jiangtao Kong, Jiacheng Shi, Ashley Gao, Shaohan Hu, Tianyi Zhou, Huajie Shao

TL;DR

An innovative modification to DD that distills synthetic data from a sliding window of checkpoints in history (rather than checkpoints on multiple training trajectories) is proposed, which significantly outperforms existing replay-based baselines and can be seamlessly integrated into most existing replay-based CIL models.

Abstract

Incremental learning (IL) aims to acquire new knowledge from current tasks while retaining knowledge learned from previous tasks. Replay-based IL methods store a set of exemplars from previous tasks in a buffer and replay them when learning new tasks. However, there is usually a size-limited buffer that cannot store adequate real exemplars to retain the knowledge of previous tasks. In contrast, data distillation (DD) can reduce the exemplar buffer's size, by condensing a large real dataset into a much smaller set of more information-compact synthetic exemplars. Nevertheless, DD's performance gain on IL quickly vanishes as the number of synthetic exemplars grows. To overcome the weaknesses of real-data and synthetic-data buffers, we instead optimize a hybrid memory including both types of data. Specifically, we propose an innovative modification to DD that distills synthetic data from a sliding window of checkpoints in history (rather than checkpoints on multiple training trajectories). Conditioned on the synthetic data, we then optimize the selection of real exemplars to provide complementary improvement to the DD objective. The optimized hybrid memory combines the strengths of synthetic and real exemplars, effectively mitigating catastrophic forgetting in Class IL (CIL) when the buffer size for exemplars is limited. Notably, our method can be seamlessly integrated into most existing replay-based CIL models. Extensive experiments across multiple benchmarks demonstrate that our method significantly outperforms existing replay-based baselines.

Hybrid Memory Replay: Blending Real and Distilled Data for Class Incremental Learning

TL;DR

An innovative modification to DD that distills synthetic data from a sliding window of checkpoints in history (rather than checkpoints on multiple training trajectories) is proposed, which significantly outperforms existing replay-based baselines and can be seamlessly integrated into most existing replay-based CIL models.

Abstract

Incremental learning (IL) aims to acquire new knowledge from current tasks while retaining knowledge learned from previous tasks. Replay-based IL methods store a set of exemplars from previous tasks in a buffer and replay them when learning new tasks. However, there is usually a size-limited buffer that cannot store adequate real exemplars to retain the knowledge of previous tasks. In contrast, data distillation (DD) can reduce the exemplar buffer's size, by condensing a large real dataset into a much smaller set of more information-compact synthetic exemplars. Nevertheless, DD's performance gain on IL quickly vanishes as the number of synthetic exemplars grows. To overcome the weaknesses of real-data and synthetic-data buffers, we instead optimize a hybrid memory including both types of data. Specifically, we propose an innovative modification to DD that distills synthetic data from a sliding window of checkpoints in history (rather than checkpoints on multiple training trajectories). Conditioned on the synthetic data, we then optimize the selection of real exemplars to provide complementary improvement to the DD objective. The optimized hybrid memory combines the strengths of synthetic and real exemplars, effectively mitigating catastrophic forgetting in Class IL (CIL) when the buffer size for exemplars is limited. Notably, our method can be seamlessly integrated into most existing replay-based CIL models. Extensive experiments across multiple benchmarks demonstrate that our method significantly outperforms existing replay-based baselines.

Paper Structure

This paper contains 17 sections, 2 theorems, 14 equations, 6 figures, 4 tables, 2 algorithms.

Key Result

Theorem 1

Based on the above Assumptions ass:local_bound and ass:hist_bound, when $\epsilon_{t+1}\geq \frac{\rho}{1-\epsilon_t}$, we can derive that the model trained on the hybrid memory of all previous tasks achieves performance comparable to that of the model trained on the real dataset of all previous tas

Figures (6)

  • Figure 1: Performance comparison between the real memory as used by the original CIL methods and our hybrid memory across multiple baselines: iCaRL rebuffi2017icarl, BEEF wang2022beef, and FOSTER wang2022foster, all with the same exemplar buffer size on CIFAR-100 krizhevsky2009learning.
  • Figure 2: Performance evaluation of iCaRL rebuffi2017icarl using different exemplar buffer sizes for real memory, synthetic memory, and our hybrid memory. "real memory" refers to buffers containing only real exemplars selected by iCaRL. "synthetic memory" contains only synthetic exemplars generated by CDD.
  • Figure 3: The framework of the proposed hybrid memory system for replay-based CIL. We first leverage the current real data with the hybrid memory for former classes to update the model. Then we use Continual Data Distillation (❶) to extract synthetic exemplars and conditional real data selection (❷) to choose optimal exemplars conditioned on synthetic data. Finally, the synthetic exemplars and selected real exemplars are combined to update the hybrid memory.
  • Figure 4: LAA and AIA of iCaRL with proposed hybrid memory at different synthetic exemplar ratios. "LAA" refers to the last average accuracy, "AIA" refers to the average incremental accuracy.
  • Figure 5: Performance comparison of the hybrid memory using different selection methods with iCaRL, BEEF, and FOSTER on CIFAR-100, all using the same exemplar buffer size.
  • ...and 1 more figures

Theorems & Definitions (8)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Theorem 1: Performance Approximation
  • Remark
  • Theorem 1: Performance Approximation
  • proof