Do Your Best and Get Enough Rest for Continual Learning
Hankyul Kang, Gregor Seifer, Donghyun Lee, Jongbin Ryu
TL;DR
This paper tackles catastrophic forgetting in continual learning by leveraging forgetting-curve theory to optimize recall intervals between retraining events. It introduces the view-batch model (VBM), which combines a replay-with-augmentation component that extends the recall interval by using $V$ views per sample and a self-supervised learning (SSL) module with a one-to-many divergence loss, formulated as $L_{ssl}(f_\theta, \mathcal{B}^\mathcal{V}) = \frac{1}{B \cdot (V-1)} \sum_{i=1}^{B} \sum_{j=2}^{V} D_{KL}(p_i^{1} || p_i^{j})$, together with the supervised loss. The authors demonstrate that an optimal recall interval (around $x3$ or $x4$ in their experiments) yields slower memory-decay and higher end-of-training accuracy across multiple CL protocols (CIL, TIL, DIL) and baselines, in both rehearsal and non-rehearsal settings. Extensive experiments on datasets such as S-CIFAR-10/100, S-Tiny-ImageNet, S-ImageNet-R, and DomainNet show consistent improvements, and the method is released as open-source at the provided GitHub repository. The work formalizes forgetting-curve concepts for neural networks, showing that strategically spaced learning can significantly enhance long-term retention in continual learning while maintaining computational efficiency; equations modeling memory retention, e.g., $R(t) = A(bt+1)^{-S}$ with $S = 1 + c(\ ln(I+1) - d)^2$, underpin the rationale for optimized recall intervals.
Abstract
According to the forgetting curve theory, we can enhance memory retention by learning extensive data and taking adequate rest. This means that in order to effectively retain new knowledge, it is essential to learn it thoroughly and ensure sufficient rest so that our brain can memorize without forgetting. The main takeaway from this theory is that learning extensive data at once necessitates sufficient rest before learning the same data again. This aspect of human long-term memory retention can be effectively utilized to address the continual learning of neural networks. Retaining new knowledge for a long period of time without catastrophic forgetting is the critical problem of continual learning. Therefore, based on Ebbinghaus' theory, we introduce the view-batch model that adjusts the learning schedules to optimize the recall interval between retraining the same samples. The proposed view-batch model allows the network to get enough rest to learn extensive knowledge from the same samples with a recall interval of sufficient length. To this end, we specifically present two approaches: 1) a replay method that guarantees the optimal recall interval, and 2) a self-supervised learning that acquires extensive knowledge from a single training sample at a time. We empirically show that these approaches of our method are aligned with the forgetting curve theory, which can enhance long-term memory. In our experiments, we also demonstrate that our method significantly improves many state-of-the-art continual learning methods in various protocols and scenarios. We open-source this project at https://github.com/hankyul2/ViewBatchModel.
