Prediction Error-based Classification for Class-Incremental Learning

Michał Zając; Tinne Tuytelaars; Gido M. van de Ven

Prediction Error-based Classification for Class-Incremental Learning

Michał Zając, Tinne Tuytelaars, Gido M. van de Ven

TL;DR

This work tackles class-incremental learning (CIL) by proposing Prediction Error-based Classification (PEC), a per-class, teacher–student framework where each class c has a dedicated student network g_{\theta^c} trained to imitate a fixed random teacher h_{\phi}. Inference scores each class by the squared prediction error $||g_{\theta^c}(x) - h_{\phi}(x)||^2$, linking PEC to Gaussian Process posterior variance as a principled uncertainty-based rule. PEC excels in single-pass online CIL across MNIST, SVHN, CIFAR-10/100, and miniImageNet, outperforming rehearsal-free baselines and competing well with rehearsed methods under moderate buffer sizes, while maintaining low hyperparameter complexity. The method is supported by GP-based theory, robust empirical results, and architectural analyses, and it avoids forgetting through strict per-class modularization, with practical implications for sample efficiency and streaming scenarios.

Abstract

Class-incremental learning (CIL) is a particularly challenging variant of continual learning, where the goal is to learn to discriminate between all classes presented in an incremental fashion. Existing approaches often suffer from excessive forgetting and imbalance of the scores assigned to classes that have not been seen together during training. In this study, we introduce a novel approach, Prediction Error-based Classification (PEC), which differs from traditional discriminative and generative classification paradigms. PEC computes a class score by measuring the prediction error of a model trained to replicate the outputs of a frozen random neural network on data from that class. The method can be interpreted as approximating a classification rule based on Gaussian Process posterior variance. PEC offers several practical advantages, including sample efficiency, ease of tuning, and effectiveness even when data are presented one class at a time. Our empirical results show that PEC performs strongly in single-pass-through-data CIL, outperforming other rehearsal-free baselines in all cases and rehearsal-based methods with moderate replay buffer size in most cases across multiple benchmarks.

Prediction Error-based Classification for Class-Incremental Learning

TL;DR

, linking PEC to Gaussian Process posterior variance as a principled uncertainty-based rule. PEC excels in single-pass online CIL across MNIST, SVHN, CIFAR-10/100, and miniImageNet, outperforming rehearsal-free baselines and competing well with rehearsed methods under moderate buffer sizes, while maintaining low hyperparameter complexity. The method is supported by GP-based theory, robust empirical results, and architectural analyses, and it avoids forgetting through strict per-class modularization, with practical implications for sample efficiency and streaming scenarios.

Abstract

Paper Structure (42 sections, 4 theorems, 13 equations, 6 figures, 24 tables, 2 algorithms)

This paper contains 42 sections, 4 theorems, 13 equations, 6 figures, 24 tables, 2 algorithms.

Introduction
Preliminaries
Class-incremental learning
Approaches to class-incremental learning
Method
PEC algorithm
Theoretical support for PEC
Experiments
Experimental setup
Performance of PEC
Performance in varying regimes
Comparability of PEC class scores
Impact of architectural choices
Related work
Limitations and future work
...and 27 more sections

Key Result

Proposition 1

Let $H$ be a Gaussian Process and $X$ a dataset. Then the following inequality holds:

Figures (6)

Figure 1: Comparison of discriminative classification, generative classification, and Prediction Error-based Classification.
Figure 2: CIL performance for varying number of epochs (left) and number of model parameters (right). Plotted are the means (solid or dashed lines) and standard errors (shaded areas) from $10$ seeds. For PEC and VAE-GC, results are the same for both task splits (see Appendix \ref{['app:details_methods']}), and hence single curves are shown. Vertical dotted lines indicate the settings used for the experiments in Tables \ref{['tab:main_one_class']} and \ref{['tab:main_many_classes']}.
Figure 3: Impact of architectural choices on PEC's performance. Plotted are the means (solid lines) and standard errors (shaded areas) from $10$ seeds. Vertical dotted lines mark settings used for experiments in Tables \ref{['tab:main_one_class']} and \ref{['tab:main_many_classes']}.
Figure 4: CIL performance for varying learning rate. Plotted are the means (solid or dashed lines) and standard errors (shaded areas) from $10$ seeds. For PEC and VAE-GC, results are the same for both task splits, and hence single curves are shown. In case of VAE-GC experiment on CIFAR-10 with learning rate $0.01$, runs failed because of numerical instabilities.
Figure 5: CIL performance for varying number of ensemble members. Plotted are the means (solid lines) and standard errors (shaded areas) from $10$ seeds.
...and 1 more figures

Theorems & Definitions (8)

Proposition 1: Proposition 1 from ciosek2020conservative, rephrased
Proposition 2
proof
Lemma 1: Lemma 4 from ciosek2020conservative, rephrased
proof
proof
Proposition 3
proof

Prediction Error-based Classification for Class-Incremental Learning

TL;DR

Abstract

Prediction Error-based Classification for Class-Incremental Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (8)