Table of Contents
Fetching ...

Mitigating Catastrophic Forgetting in Streaming Generative and Predictive Learning via Stateful Replay

Wenzhang Du

TL;DR

The paper tackles catastrophic forgetting in streaming learning under memory constraints by proposing a minimalist yet effective stateful replay approach that uses a fixed-capacity buffer to mix past and current data. It unifies autoencoding, forecasting, and classification under a single negative log-likelihood objective and analyzes forgetting through a gradient-alignment lens, showing when replay can turn potentially harmful updates into benign ones. Empirically, stateful replay substantially reduces average forgetting on heterogeneous multi-task streams (e.g., RotMNIST digit pairs and Airlines groups) while behaving similarly to SeqFT on benign time-based streams, with transparent logging across six streaming scenarios. The work positions stateful replay as a practical, interpretable baseline for continual learning in streaming environments and outlines directions for buffer design, combination with regularization, and scaling to larger, multi-modal systems.

Abstract

Many deployed learning systems must update models on streaming data under memory constraints. The default strategy, sequential fine-tuning on each new phase, is architecture-agnostic but often suffers catastrophic forgetting when later phases correspond to different sub-populations or tasks. Replay with a finite buffer is a simple alternative, yet its behaviour across generative and predictive objectives is not well understood. We present a unified study of stateful replay for streaming autoencoding, time series forecasting, and classification. We view both sequential fine-tuning and replay as stochastic gradient methods for an ideal joint objective, and use a gradient alignment analysis to show when mixing current and historical samples should reduce forgetting. We then evaluate a single replay mechanism on six streaming scenarios built from Rotated MNIST, ElectricityLoadDiagrams 2011-2014, and Airlines delay data, using matched training budgets and three seeds. On heterogeneous multi task streams, replay reduces average forgetting by a factor of two to three, while on benign time based streams both methods perform similarly. These results position stateful replay as a strong and simple baseline for continual learning in streaming environments.

Mitigating Catastrophic Forgetting in Streaming Generative and Predictive Learning via Stateful Replay

TL;DR

The paper tackles catastrophic forgetting in streaming learning under memory constraints by proposing a minimalist yet effective stateful replay approach that uses a fixed-capacity buffer to mix past and current data. It unifies autoencoding, forecasting, and classification under a single negative log-likelihood objective and analyzes forgetting through a gradient-alignment lens, showing when replay can turn potentially harmful updates into benign ones. Empirically, stateful replay substantially reduces average forgetting on heterogeneous multi-task streams (e.g., RotMNIST digit pairs and Airlines groups) while behaving similarly to SeqFT on benign time-based streams, with transparent logging across six streaming scenarios. The work positions stateful replay as a practical, interpretable baseline for continual learning in streaming environments and outlines directions for buffer design, combination with regularization, and scaling to larger, multi-modal systems.

Abstract

Many deployed learning systems must update models on streaming data under memory constraints. The default strategy, sequential fine-tuning on each new phase, is architecture-agnostic but often suffers catastrophic forgetting when later phases correspond to different sub-populations or tasks. Replay with a finite buffer is a simple alternative, yet its behaviour across generative and predictive objectives is not well understood. We present a unified study of stateful replay for streaming autoencoding, time series forecasting, and classification. We view both sequential fine-tuning and replay as stochastic gradient methods for an ideal joint objective, and use a gradient alignment analysis to show when mixing current and historical samples should reduce forgetting. We then evaluate a single replay mechanism on six streaming scenarios built from Rotated MNIST, ElectricityLoadDiagrams 2011-2014, and Airlines delay data, using matched training budgets and three seeds. On heterogeneous multi task streams, replay reduces average forgetting by a factor of two to three, while on benign time based streams both methods perform similarly. These results position stateful replay as a strong and simple baseline for continual learning in streaming environments.

Paper Structure

This paper contains 28 sections, 1 theorem, 12 equations, 4 figures, 4 tables.

Key Result

proposition 1

Fix $k<t$ and $\theta$. Assume: Then there exists $\lambda^\star \in (0,1)$ such that for all $\lambda \in [\lambda^\star,1]$, so the first-order change in $R_k$ under a Replay step is non-positive.

Figures (4)

  • Figure 1: RotMNIST digit-pair classification. For each phase we plot initial (solid) and final (dashed) test accuracy for SeqFT and Replay, averaged over seeds. SeqFT heavily forgets early digit pairs, while Replay preserves most of their performance.
  • Figure 2: Airlines airline-group classification. Replay consistently reduces forgetting on early carrier groups compared to SeqFT, while remaining competitive on later groups.
  • Figure 3: Electricity forecasting under temporal (left) and meter-group (right) splits. Initial and final MSE per phase are nearly identical for SeqFT and Replay, indicating negligible forgetting and some positive transfer.
  • Figure 4: Average forgetting on classification scenarios (SeqFT vs. Replay). Bars show mean forgetting $F_k$ (init--final, in accuracy points) over phases and seeds; error bars indicate standard deviation.

Theorems & Definitions (2)

  • proposition 1: Alignment condition
  • proof