Table of Contents
Fetching ...

Sample Compression for Self Certified Continual Learning

Jacob Comeau, Mathieu Bazinet, Pascal Germain, Cem Subakan

TL;DR

It turns out that CoP2L is empirically competitive with baseline methods while certifying predictor reliability in continual learning with a non-vacuous bound, which effectively mitigates catastrophic forgetting.

Abstract

Continual learning algorithms aim to learn from a sequence of tasks. In order to avoid catastrophic forgetting, most existing approaches rely on heuristics and do not provide computable learning guarantees. In this paper, we introduce Continual Pick-to-Learn (CoP2L), a method grounded in sample compression theory that retains representative samples for each task in a principled and efficient way. This allows us to derive non-vacuous, numerically computable upper bounds on the generalization loss of the learned predictors after each task. We evaluate CoP2L on standard continual learning benchmarks under Class-Incremental and Task-Incremental settings, showing that it effectively mitigates catastrophic forgetting. It turns out that CoP2L is empirically competitive with baseline methods while certifying predictor reliability in continual learning with a non-vacuous bound.

Sample Compression for Self Certified Continual Learning

TL;DR

It turns out that CoP2L is empirically competitive with baseline methods while certifying predictor reliability in continual learning with a non-vacuous bound, which effectively mitigates catastrophic forgetting.

Abstract

Continual learning algorithms aim to learn from a sequence of tasks. In order to avoid catastrophic forgetting, most existing approaches rely on heuristics and do not provide computable learning guarantees. In this paper, we introduce Continual Pick-to-Learn (CoP2L), a method grounded in sample compression theory that retains representative samples for each task in a principled and efficient way. This allows us to derive non-vacuous, numerically computable upper bounds on the generalization loss of the learned predictors after each task. We evaluate CoP2L on standard continual learning benchmarks under Class-Incremental and Task-Incremental settings, showing that it effectively mitigates catastrophic forgetting. It turns out that CoP2L is empirically competitive with baseline methods while certifying predictor reliability in continual learning with a non-vacuous bound.

Paper Structure

This paper contains 44 sections, 5 theorems, 23 equations, 14 figures, 27 tables, 5 algorithms.

Key Result

Theorem 2.1

For any distribution $\mathop{\mathrm{\mathcal{D}}}\nolimits$ over $\mathop{\mathrm{\mathcal{X}}}\nolimits \times \mathop{\mathrm{\mathcal{Y}}}\nolimits$, for any family of set of messages $\{\mathscr{M}(\mathop{\mathrm{\mathbf{i}}}\nolimits) | \mathop{\mathrm{\mathbf{i}}}\nolimits {\in} \mathop{\ma with $\theta_{\mathop{\mathrm{\mathbf{i}}}\nolimits, \mu} \,{=}\, \mathop{\mathrm{\mathscr{R}}}\nol

Figures (14)

  • Figure 1: Numerical values of the proposed generalization bounds for continual learning over 10 tasks on CIFAR100, using ViT and ResNet50 backbones. The bounds hold simultaneously for all tasks.
  • Figure 2: Illustration of the behavior of the bound on CIFAR10 with 5 tasks.
  • Figure 3: Illustration of the behavior of the bound on CIFAR100 with 20 tasks.
  • Figure 4: CoP2L Bounds with respect to tasks on the MNIST, Fashion-MNIST and EMNIST datasets using an MLP architecture
  • Figure 5: CoP2L Bounds with respect to tasks on the MNIST, Fashion-MNIST and EMNIST datasets using a CNN architecture
  • ...and 9 more figures

Theorems & Definitions (7)

  • Theorem 2.1: bazinet2024
  • Theorem 3.1
  • Theorem C.1: foong2022note
  • Theorem C.2
  • proof
  • Theorem D.1
  • proof