Table of Contents
Fetching ...

Compact Memory for Continual Logistic Regression

Yohan Jung, Hyungi Lee, Wenlong Chen, Thomas Möllenhoff, Yingzhen Li, Juho Lee, Mohammad Emtiyaz Khan

TL;DR

This work addresses the gap between continual and batch learning by proposing a compact memory approach for continual logistic regression. It reframes memory selection as Hessian matching and solves memory estimation via Probabilistic PCA (PPCA) within a Bayesian knowledge-prior (K-prior) framework, enabling near-batch performance with a drastically smaller memory footprint. Empirical results on binary and multi-class tasks—including Split-ImageNet with memory as small as 0.3–2% of data—demonstrate substantial improvements over replay and prior methods, closing much of the gap to batch accuracy. The method offers a principled, memory-efficient direction for continual learning, with extensions to multi-class GLMs and potential applicability to deeper models using frozen feature extractors.

Abstract

Despite recent progress, continual learning still does not match the performance of batch training. To avoid catastrophic forgetting, we need to build compact memory of essential past knowledge, but no clear solution has yet emerged, even for shallow neural networks with just one or two layers. In this paper, we present a new method to build compact memory for logistic regression. Our method is based on a result by Khan and Swaroop [2021] who show the existence of optimal memory for such models. We formulate the search for the optimal memory as Hessian-matching and propose a probabilistic PCA method to estimate them. Our approach can drastically improve accuracy compared to Experience Replay. For instance, on Split-ImageNet, we get 60% accuracy compared to 30% obtained by replay with memory-size equivalent to 0.3% of the data size. Increasing the memory size to 2% further boosts the accuracy to 74%, closing the gap to the batch accuracy of 77.6% on this task. Our work opens a new direction for building compact memory that can also be useful in the future for continual deep learning.

Compact Memory for Continual Logistic Regression

TL;DR

This work addresses the gap between continual and batch learning by proposing a compact memory approach for continual logistic regression. It reframes memory selection as Hessian matching and solves memory estimation via Probabilistic PCA (PPCA) within a Bayesian knowledge-prior (K-prior) framework, enabling near-batch performance with a drastically smaller memory footprint. Empirical results on binary and multi-class tasks—including Split-ImageNet with memory as small as 0.3–2% of data—demonstrate substantial improvements over replay and prior methods, closing much of the gap to batch accuracy. The method offers a principled, memory-efficient direction for continual learning, with extensions to multi-class GLMs and potential applicability to deeper models using frozen feature extractors.

Abstract

Despite recent progress, continual learning still does not match the performance of batch training. To avoid catastrophic forgetting, we need to build compact memory of essential past knowledge, but no clear solution has yet emerged, even for shallow neural networks with just one or two layers. In this paper, we present a new method to build compact memory for logistic regression. Our method is based on a result by Khan and Swaroop [2021] who show the existence of optimal memory for such models. We formulate the search for the optimal memory as Hessian-matching and propose a probabilistic PCA method to estimate them. Our approach can drastically improve accuracy compared to Experience Replay. For instance, on Split-ImageNet, we get 60% accuracy compared to 30% obtained by replay with memory-size equivalent to 0.3% of the data size. Increasing the memory size to 2% further boosts the accuracy to 74%, closing the gap to the batch accuracy of 77.6% on this task. Our work opens a new direction for building compact memory that can also be useful in the future for continual deep learning.

Paper Structure

This paper contains 37 sections, 1 theorem, 43 equations, 27 figures, 4 tables, 2 algorithms.

Key Result

Theorem 1

For the problem in eq:linreg, the K-prior shown in eq:kprior_proposed is equivalent to the original loss and takes a quadratic form if we set $\hbox{$\hbox{$\boldsymbol{\theta}$}$}_t \gets \hbox{$\hbox{$\boldsymbol{\theta}$}$}_t^*$, $\hbox{$\hbox{$\mathbf{U}$}$}_t \gets \hbox{$\hbox{$\mathbf{U}$}$}_ where $\hbox{$\hbox{$\mathbf{H}$}$}(\hbox{$\hbox{$\boldsymbol{\theta}$}$}_t) = \hbox{$\hbox{$\bolds

Figures (27)

  • Figure 1: Standard continual-learning methods, such as those using weight regularization, use past parameters $\hbox{$\hbox{$\boldsymbol{\theta}$}$}_t$ to update $\hbox{$\hbox{$\boldsymbol{\theta}$}$}_{t+1}$ for the new task $\mathcal{D}_{t+1}$ (shown on the left). We propose a new method (shown on the right) that also builds compact memory and reuses it to continually learn. The memory consists of a set $\hbox{$\hbox{$\mathbf{U}$}$}_t$ of memory vectors and a weight vector $\hbox{$\hbox{$\mathbf{w}$}$}_t$. These are used to update $\hbox{$\hbox{$\boldsymbol{\theta}$}$}_{t+1}$ when new $\mathcal{D}_{t+1}$ arrives. Afterward, the memory is updated to get the new $(\hbox{$\hbox{$\mathbf{U}$}$}_{t+1}, \hbox{$\hbox{$\mathbf{w}$}$}_{t+1})$.
  • Figure 2: Hessian Matching by PPCA
  • Figure 3: Binary logistic regression on a toy 'four-moon' datasets with four tasks (red$\rightarrow$blue$\rightarrow$black$\rightarrow$green). The gray line shows the batch training, while our compact-memory method's boundary are shown with colors. We see that estimated boundary closely tracks the batch solution and ultimately recovers it at the end of training. We also show with circles the points in the input space that are closed to the memories in terms of the loss gradient.
  • Figure 4: USPS odd VS even
  • Figure 5: Representation of learned memories
  • ...and 22 more figures

Theorems & Definitions (1)

  • Theorem 1