Data-Distill-Net: A Data Distillation Approach Tailored for Reply-based Continual Learning
Wenyang Liao, Quanziang Wang, Yichen Wu, Renzhen Wang, Deyu Meng
TL;DR
This work targets catastrophic forgetting in replay-based continual learning by introducing a data distillation framework that distills cross-task information into a learnable memory buffer. It introduces Data-Distill-Net (DDN), a lightweight hyper-network that generates soft labels for buffer samples, enabling global information distillation with reduced parameterization and avoiding heavy updates to the entire buffer. The approach uses a bi-level optimization to align gradients between the current and past data, and provides theoretical connections showing equivalence to gradient matching. Empirically, DDN improves average accuracy and reduces forgetting across online and offline settings when plugged into multiple replay-based baselines and across standard CL benchmarks, with strong performance especially at tight memory budgets. The method offers practical efficiency and plug-in compatibility, promoting more robust continual learning in resource-constrained environments.
Abstract
Replay-based continual learning (CL) methods assume that models trained on a small subset can also effectively minimize the empirical risk of the complete dataset. These methods maintain a memory buffer that stores a sampled subset of data from previous tasks to consolidate past knowledge. However, this assumption is not guaranteed in practice due to the limited capacity of the memory buffer and the heuristic criteria used for buffer data selection. To address this issue, we propose a new dataset distillation framework tailored for CL, which maintains a learnable memory buffer to distill the global information from the current task data and accumulated knowledge preserved in the previous memory buffer. Moreover, to avoid the computational overhead and overfitting risks associated with parameterizing the entire buffer during distillation, we introduce a lightweight distillation module that can achieve global information distillation solely by generating learnable soft labels for the memory buffer data. Extensive experiments show that, our method can achieve competitive results and effectively mitigates forgetting across various datasets. The source code will be publicly available.
