Compact Memory for Continual Logistic Regression
Yohan Jung, Hyungi Lee, Wenlong Chen, Thomas Möllenhoff, Yingzhen Li, Juho Lee, Mohammad Emtiyaz Khan
TL;DR
This work addresses the gap between continual and batch learning by proposing a compact memory approach for continual logistic regression. It reframes memory selection as Hessian matching and solves memory estimation via Probabilistic PCA (PPCA) within a Bayesian knowledge-prior (K-prior) framework, enabling near-batch performance with a drastically smaller memory footprint. Empirical results on binary and multi-class tasks—including Split-ImageNet with memory as small as 0.3–2% of data—demonstrate substantial improvements over replay and prior methods, closing much of the gap to batch accuracy. The method offers a principled, memory-efficient direction for continual learning, with extensions to multi-class GLMs and potential applicability to deeper models using frozen feature extractors.
Abstract
Despite recent progress, continual learning still does not match the performance of batch training. To avoid catastrophic forgetting, we need to build compact memory of essential past knowledge, but no clear solution has yet emerged, even for shallow neural networks with just one or two layers. In this paper, we present a new method to build compact memory for logistic regression. Our method is based on a result by Khan and Swaroop [2021] who show the existence of optimal memory for such models. We formulate the search for the optimal memory as Hessian-matching and propose a probabilistic PCA method to estimate them. Our approach can drastically improve accuracy compared to Experience Replay. For instance, on Split-ImageNet, we get 60% accuracy compared to 30% obtained by replay with memory-size equivalent to 0.3% of the data size. Increasing the memory size to 2% further boosts the accuracy to 74%, closing the gap to the batch accuracy of 77.6% on this task. Our work opens a new direction for building compact memory that can also be useful in the future for continual deep learning.
