Information-Theoretic Generalization Bounds of Replay-based Continual Learning

Wen Wen; Tieliang Gong; Zeyu Gao; Yunjiao Zhang; Weizhan Zhang; Yong-Jin Liu

Information-Theoretic Generalization Bounds of Replay-based Continual Learning

Wen Wen, Tieliang Gong, Zeyu Gao, Yunjiao Zhang, Weizhan Zhang, Yong-Jin Liu

TL;DR

This paper addresses the theoretical generalization behavior of replay-based continual learning under memory constraints by developing an information-theoretic framework. It derives three families of bounds—hypothesis-based, prediction-based, and SGLD-specific—that quantify how the memory buffer and current task data influence generalization through mutual information and conditional mutual information terms, achieving fast rates in the supersample setting. The bounds reveal a fundamental memory–dependency trade-off: increasing exemplar count reduces memory-approximation error but can raise information dependence, highlighting the value of representative, low-information-memory samples. Empirical results on MNIST and CIFAR-10 validate the bounds’ ability to track real generalization dynamics and demonstrate that loss-based bounds are particularly tight and computationally practical for deep learning in replay-based CL.

Abstract

Continual learning (CL) has emerged as a dominant paradigm for acquiring knowledge from sequential tasks while avoiding catastrophic forgetting. Although many CL methods have been proposed to show impressive empirical performance, the theoretical understanding of their generalization behavior remains limited, particularly for replay-based approaches. This paper establishes a unified theoretical framework for replay-based CL, deriving a series of information-theoretic generalization bounds that explicitly elucidate the impact of the memory buffer alongside the current task on generalization performance. Specifically, our hypothesis-based bounds capture the trade-off between the number of selected exemplars and the information dependency between the hypothesis and the memory buffer. Our prediction-based bounds yield tighter and computationally tractable upper bounds on the generalization error by leveraging low-dimensional variables. Theoretical analysis is general and broadly applicable to a wide range of learning algorithms, exemplified by stochastic gradient Langevin dynamics (SGLD) as a representative method. Comprehensive experimental evaluations demonstrate the effectiveness of our derived bounds in capturing the generalization dynamics in replay-based CL settings.

Information-Theoretic Generalization Bounds of Replay-based Continual Learning

TL;DR

Abstract

Information-Theoretic Generalization Bounds of Replay-based Continual Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (30)