Table of Contents
Fetching ...

Prediction Error-based Classification for Class-Incremental Learning

Michał Zając, Tinne Tuytelaars, Gido M. van de Ven

TL;DR

This work tackles class-incremental learning (CIL) by proposing Prediction Error-based Classification (PEC), a per-class, teacher–student framework where each class c has a dedicated student network g_{\theta^c} trained to imitate a fixed random teacher h_{\phi}. Inference scores each class by the squared prediction error $||g_{\theta^c}(x) - h_{\phi}(x)||^2$, linking PEC to Gaussian Process posterior variance as a principled uncertainty-based rule. PEC excels in single-pass online CIL across MNIST, SVHN, CIFAR-10/100, and miniImageNet, outperforming rehearsal-free baselines and competing well with rehearsed methods under moderate buffer sizes, while maintaining low hyperparameter complexity. The method is supported by GP-based theory, robust empirical results, and architectural analyses, and it avoids forgetting through strict per-class modularization, with practical implications for sample efficiency and streaming scenarios.

Abstract

Class-incremental learning (CIL) is a particularly challenging variant of continual learning, where the goal is to learn to discriminate between all classes presented in an incremental fashion. Existing approaches often suffer from excessive forgetting and imbalance of the scores assigned to classes that have not been seen together during training. In this study, we introduce a novel approach, Prediction Error-based Classification (PEC), which differs from traditional discriminative and generative classification paradigms. PEC computes a class score by measuring the prediction error of a model trained to replicate the outputs of a frozen random neural network on data from that class. The method can be interpreted as approximating a classification rule based on Gaussian Process posterior variance. PEC offers several practical advantages, including sample efficiency, ease of tuning, and effectiveness even when data are presented one class at a time. Our empirical results show that PEC performs strongly in single-pass-through-data CIL, outperforming other rehearsal-free baselines in all cases and rehearsal-based methods with moderate replay buffer size in most cases across multiple benchmarks.

Prediction Error-based Classification for Class-Incremental Learning

TL;DR

This work tackles class-incremental learning (CIL) by proposing Prediction Error-based Classification (PEC), a per-class, teacher–student framework where each class c has a dedicated student network g_{\theta^c} trained to imitate a fixed random teacher h_{\phi}. Inference scores each class by the squared prediction error , linking PEC to Gaussian Process posterior variance as a principled uncertainty-based rule. PEC excels in single-pass online CIL across MNIST, SVHN, CIFAR-10/100, and miniImageNet, outperforming rehearsal-free baselines and competing well with rehearsed methods under moderate buffer sizes, while maintaining low hyperparameter complexity. The method is supported by GP-based theory, robust empirical results, and architectural analyses, and it avoids forgetting through strict per-class modularization, with practical implications for sample efficiency and streaming scenarios.

Abstract

Class-incremental learning (CIL) is a particularly challenging variant of continual learning, where the goal is to learn to discriminate between all classes presented in an incremental fashion. Existing approaches often suffer from excessive forgetting and imbalance of the scores assigned to classes that have not been seen together during training. In this study, we introduce a novel approach, Prediction Error-based Classification (PEC), which differs from traditional discriminative and generative classification paradigms. PEC computes a class score by measuring the prediction error of a model trained to replicate the outputs of a frozen random neural network on data from that class. The method can be interpreted as approximating a classification rule based on Gaussian Process posterior variance. PEC offers several practical advantages, including sample efficiency, ease of tuning, and effectiveness even when data are presented one class at a time. Our empirical results show that PEC performs strongly in single-pass-through-data CIL, outperforming other rehearsal-free baselines in all cases and rehearsal-based methods with moderate replay buffer size in most cases across multiple benchmarks.
Paper Structure (42 sections, 4 theorems, 13 equations, 6 figures, 24 tables, 2 algorithms)

This paper contains 42 sections, 4 theorems, 13 equations, 6 figures, 24 tables, 2 algorithms.

Key Result

Proposition 1

Let $H$ be a Gaussian Process and $X$ a dataset. Then the following inequality holds:

Figures (6)

  • Figure 1: Comparison of discriminative classification, generative classification, and Prediction Error-based Classification.
  • Figure 2: CIL performance for varying number of epochs (left) and number of model parameters (right). Plotted are the means (solid or dashed lines) and standard errors (shaded areas) from $10$ seeds. For PEC and VAE-GC, results are the same for both task splits (see Appendix \ref{['app:details_methods']}), and hence single curves are shown. Vertical dotted lines indicate the settings used for the experiments in Tables \ref{['tab:main_one_class']} and \ref{['tab:main_many_classes']}.
  • Figure 3: Impact of architectural choices on PEC's performance. Plotted are the means (solid lines) and standard errors (shaded areas) from $10$ seeds. Vertical dotted lines mark settings used for experiments in Tables \ref{['tab:main_one_class']} and \ref{['tab:main_many_classes']}.
  • Figure 4: CIL performance for varying learning rate. Plotted are the means (solid or dashed lines) and standard errors (shaded areas) from $10$ seeds. For PEC and VAE-GC, results are the same for both task splits, and hence single curves are shown. In case of VAE-GC experiment on CIFAR-10 with learning rate $0.01$, runs failed because of numerical instabilities.
  • Figure 5: CIL performance for varying number of ensemble members. Plotted are the means (solid lines) and standard errors (shaded areas) from $10$ seeds.
  • ...and 1 more figures

Theorems & Definitions (8)

  • Proposition 1: Proposition 1 from ciosek2020conservative, rephrased
  • Proposition 2
  • proof
  • Lemma 1: Lemma 4 from ciosek2020conservative, rephrased
  • proof
  • proof
  • Proposition 3
  • proof