Information-Theoretic Generalization Bounds of Replay-based Continual Learning
Wen Wen, Tieliang Gong, Zeyu Gao, Yunjiao Zhang, Weizhan Zhang, Yong-Jin Liu
TL;DR
This paper addresses the theoretical generalization behavior of replay-based continual learning under memory constraints by developing an information-theoretic framework. It derives three families of bounds—hypothesis-based, prediction-based, and SGLD-specific—that quantify how the memory buffer and current task data influence generalization through mutual information and conditional mutual information terms, achieving fast rates in the supersample setting. The bounds reveal a fundamental memory–dependency trade-off: increasing exemplar count reduces memory-approximation error but can raise information dependence, highlighting the value of representative, low-information-memory samples. Empirical results on MNIST and CIFAR-10 validate the bounds’ ability to track real generalization dynamics and demonstrate that loss-based bounds are particularly tight and computationally practical for deep learning in replay-based CL.
Abstract
Continual learning (CL) has emerged as a dominant paradigm for acquiring knowledge from sequential tasks while avoiding catastrophic forgetting. Although many CL methods have been proposed to show impressive empirical performance, the theoretical understanding of their generalization behavior remains limited, particularly for replay-based approaches. This paper establishes a unified theoretical framework for replay-based CL, deriving a series of information-theoretic generalization bounds that explicitly elucidate the impact of the memory buffer alongside the current task on generalization performance. Specifically, our hypothesis-based bounds capture the trade-off between the number of selected exemplars and the information dependency between the hypothesis and the memory buffer. Our prediction-based bounds yield tighter and computationally tractable upper bounds on the generalization error by leveraging low-dimensional variables. Theoretical analysis is general and broadly applicable to a wide range of learning algorithms, exemplified by stochastic gradient Langevin dynamics (SGLD) as a representative method. Comprehensive experimental evaluations demonstrate the effectiveness of our derived bounds in capturing the generalization dynamics in replay-based CL settings.
