Table of Contents
Fetching ...

Do Your Best and Get Enough Rest for Continual Learning

Hankyul Kang, Gregor Seifer, Donghyun Lee, Jongbin Ryu

TL;DR

This paper tackles catastrophic forgetting in continual learning by leveraging forgetting-curve theory to optimize recall intervals between retraining events. It introduces the view-batch model (VBM), which combines a replay-with-augmentation component that extends the recall interval by using $V$ views per sample and a self-supervised learning (SSL) module with a one-to-many divergence loss, formulated as $L_{ssl}(f_\theta, \mathcal{B}^\mathcal{V}) = \frac{1}{B \cdot (V-1)} \sum_{i=1}^{B} \sum_{j=2}^{V} D_{KL}(p_i^{1} || p_i^{j})$, together with the supervised loss. The authors demonstrate that an optimal recall interval (around $x3$ or $x4$ in their experiments) yields slower memory-decay and higher end-of-training accuracy across multiple CL protocols (CIL, TIL, DIL) and baselines, in both rehearsal and non-rehearsal settings. Extensive experiments on datasets such as S-CIFAR-10/100, S-Tiny-ImageNet, S-ImageNet-R, and DomainNet show consistent improvements, and the method is released as open-source at the provided GitHub repository. The work formalizes forgetting-curve concepts for neural networks, showing that strategically spaced learning can significantly enhance long-term retention in continual learning while maintaining computational efficiency; equations modeling memory retention, e.g., $R(t) = A(bt+1)^{-S}$ with $S = 1 + c(\ ln(I+1) - d)^2$, underpin the rationale for optimized recall intervals.

Abstract

According to the forgetting curve theory, we can enhance memory retention by learning extensive data and taking adequate rest. This means that in order to effectively retain new knowledge, it is essential to learn it thoroughly and ensure sufficient rest so that our brain can memorize without forgetting. The main takeaway from this theory is that learning extensive data at once necessitates sufficient rest before learning the same data again. This aspect of human long-term memory retention can be effectively utilized to address the continual learning of neural networks. Retaining new knowledge for a long period of time without catastrophic forgetting is the critical problem of continual learning. Therefore, based on Ebbinghaus' theory, we introduce the view-batch model that adjusts the learning schedules to optimize the recall interval between retraining the same samples. The proposed view-batch model allows the network to get enough rest to learn extensive knowledge from the same samples with a recall interval of sufficient length. To this end, we specifically present two approaches: 1) a replay method that guarantees the optimal recall interval, and 2) a self-supervised learning that acquires extensive knowledge from a single training sample at a time. We empirically show that these approaches of our method are aligned with the forgetting curve theory, which can enhance long-term memory. In our experiments, we also demonstrate that our method significantly improves many state-of-the-art continual learning methods in various protocols and scenarios. We open-source this project at https://github.com/hankyul2/ViewBatchModel.

Do Your Best and Get Enough Rest for Continual Learning

TL;DR

This paper tackles catastrophic forgetting in continual learning by leveraging forgetting-curve theory to optimize recall intervals between retraining events. It introduces the view-batch model (VBM), which combines a replay-with-augmentation component that extends the recall interval by using views per sample and a self-supervised learning (SSL) module with a one-to-many divergence loss, formulated as , together with the supervised loss. The authors demonstrate that an optimal recall interval (around or in their experiments) yields slower memory-decay and higher end-of-training accuracy across multiple CL protocols (CIL, TIL, DIL) and baselines, in both rehearsal and non-rehearsal settings. Extensive experiments on datasets such as S-CIFAR-10/100, S-Tiny-ImageNet, S-ImageNet-R, and DomainNet show consistent improvements, and the method is released as open-source at the provided GitHub repository. The work formalizes forgetting-curve concepts for neural networks, showing that strategically spaced learning can significantly enhance long-term retention in continual learning while maintaining computational efficiency; equations modeling memory retention, e.g., with , underpin the rationale for optimized recall intervals.

Abstract

According to the forgetting curve theory, we can enhance memory retention by learning extensive data and taking adequate rest. This means that in order to effectively retain new knowledge, it is essential to learn it thoroughly and ensure sufficient rest so that our brain can memorize without forgetting. The main takeaway from this theory is that learning extensive data at once necessitates sufficient rest before learning the same data again. This aspect of human long-term memory retention can be effectively utilized to address the continual learning of neural networks. Retaining new knowledge for a long period of time without catastrophic forgetting is the critical problem of continual learning. Therefore, based on Ebbinghaus' theory, we introduce the view-batch model that adjusts the learning schedules to optimize the recall interval between retraining the same samples. The proposed view-batch model allows the network to get enough rest to learn extensive knowledge from the same samples with a recall interval of sufficient length. To this end, we specifically present two approaches: 1) a replay method that guarantees the optimal recall interval, and 2) a self-supervised learning that acquires extensive knowledge from a single training sample at a time. We empirically show that these approaches of our method are aligned with the forgetting curve theory, which can enhance long-term memory. In our experiments, we also demonstrate that our method significantly improves many state-of-the-art continual learning methods in various protocols and scenarios. We open-source this project at https://github.com/hankyul2/ViewBatchModel.

Paper Structure

This paper contains 29 sections, 7 equations, 8 figures, 13 tables, 1 algorithm.

Figures (8)

  • Figure 1: Conceptual graph of the forgetting curve. We show (a) short-term recall interval, (b) optimal recall interval, (c) long-term recall interval, and (d) degree of forgetting. (a-b) Expanding the recall interval improves long-term memory retention of neural networks by repeatedly recalling memory with moderate difficulty, whereas (c) an excessive recall interval decreases it. The depicted forgetting curve regarding recall interval is based on the spacing effect formula cepeda2006distributedcepeda2008spacing provided in the supplementary material.
  • Figure 2: Overview of experimental results. We provide comprehensive comparisons of various factors for continual learning. We perform extensive experiments on step and buffer sizes (a-b), three different continual learning methods (c), whether to use the pre-trained model (d), three different benchmarks (e), and two evaluation protocols (f). In all cases, ours improves the baseline performance consistently.
  • Figure 3: Schematic illustration of the proposed view-batch model. In subfigure (b), we show our view-batch model employing the replay (V=4) and self-supervised learning approach. In contrast to (a) the baseline method, we learn multiple views of the same sample (marked as different shades) using the proposed view-batch self-supervised loss to learn it extensively and ensure enough time-space between recall intervals. For simplicity, we assume in (a) that the entire training data and batch size are the same as four, thus a single training epoch constitutes one batch.
  • Figure 4: Empirical findings related to forgetting curve theory.(a) We report the degree of forgetting for different recall intervals with a 95% confidence interval denoted as shaded region. The degree of forgetting is measured as the performance degradation of each sample's classification accuracy between recall intervals. (b) We show the decay of memory retention as the learning progresses. This graph shows that when the formal learning stage is finished, memory retention decays over time for all three cases. (c) We compare the network's classification accuracy of continual learning. It shows that x3 achieves the best performance thanks to the slow memory retention decay.
  • Figure 5: Experimental results on memory retention decay. This analysis reports memory retention decay of the first three tasks on the S-CIFAR-10 dataset, comparing the proposed approach against the baseline methods. The green numbers at the end of the last task are the accuracy gain of the proposed approach over the baselines.
  • ...and 3 more figures