Mitigating Catastrophic Forgetting in Streaming Generative and Predictive Learning via Stateful Replay
Wenzhang Du
TL;DR
The paper tackles catastrophic forgetting in streaming learning under memory constraints by proposing a minimalist yet effective stateful replay approach that uses a fixed-capacity buffer to mix past and current data. It unifies autoencoding, forecasting, and classification under a single negative log-likelihood objective and analyzes forgetting through a gradient-alignment lens, showing when replay can turn potentially harmful updates into benign ones. Empirically, stateful replay substantially reduces average forgetting on heterogeneous multi-task streams (e.g., RotMNIST digit pairs and Airlines groups) while behaving similarly to SeqFT on benign time-based streams, with transparent logging across six streaming scenarios. The work positions stateful replay as a practical, interpretable baseline for continual learning in streaming environments and outlines directions for buffer design, combination with regularization, and scaling to larger, multi-modal systems.
Abstract
Many deployed learning systems must update models on streaming data under memory constraints. The default strategy, sequential fine-tuning on each new phase, is architecture-agnostic but often suffers catastrophic forgetting when later phases correspond to different sub-populations or tasks. Replay with a finite buffer is a simple alternative, yet its behaviour across generative and predictive objectives is not well understood. We present a unified study of stateful replay for streaming autoencoding, time series forecasting, and classification. We view both sequential fine-tuning and replay as stochastic gradient methods for an ideal joint objective, and use a gradient alignment analysis to show when mixing current and historical samples should reduce forgetting. We then evaluate a single replay mechanism on six streaming scenarios built from Rotated MNIST, ElectricityLoadDiagrams 2011-2014, and Airlines delay data, using matched training budgets and three seeds. On heterogeneous multi task streams, replay reduces average forgetting by a factor of two to three, while on benign time based streams both methods perform similarly. These results position stateful replay as a strong and simple baseline for continual learning in streaming environments.
