Infinite dSprites for Disentangled Continual Learning: Separating Memory Edits from Generalization

Sebastian Dziadzio; Çağatay Yıldız; Gido M. van de Ven; Tomasz Trzciński; Tinne Tuytelaars; Matthias Bethge

Infinite dSprites for Disentangled Continual Learning: Separating Memory Edits from Generalization

Sebastian Dziadzio, Çağatay Yıldız, Gido M. van de Ven, Tomasz Trzciński, Tinne Tuytelaars, Matthias Bethge

TL;DR

The paper tackles catastrophic forgetting in lifelong learning by introducing Infinite dSprites (ids), a procedurally generated, open-ended benchmark that enables hundreds of tasks with controllable transformation factors. It argues that existing continual learning approaches falter under long horizons and proposes Disentangled Continual Learning (dcl), which separates memorization (an exemplar memory buffer) from generalization (an equivariant affine transformation regressor with a canonicalization step). The results show that standard regularization and replay methods degrade rapidly on ids, while dcl achieves forward and backward transfer, one-shot generalization, and open-set recognition, underscoring the value of separating memory edits from general-purpose learning. This work sets the stage for scalable, open-ended continual learning benchmarks and motivates future research into memory–generalization decoupling and principled inductive biases for long-horizon AI systems, with practical implications for open-world recognition and rapid adaptation.

Abstract

The ability of machine learning systems to learn continually is hindered by catastrophic forgetting, the tendency of neural networks to overwrite previously acquired knowledge when learning a new task. Existing methods mitigate this problem through regularization, parameter isolation, or rehearsal, but they are typically evaluated on benchmarks comprising only a handful of tasks. In contrast, humans are able to learn over long time horizons in dynamic, open-world environments, effortlessly memorizing unfamiliar objects and reliably recognizing them under various transformations. To make progress towards closing this gap, we introduce Infinite dSprites, a parsimonious tool for creating continual classification and disentanglement benchmarks of arbitrary length and with full control over generative factors. We show that over a sufficiently long time horizon, the performance of all major types of continual learning methods deteriorates on this simple benchmark. This result highlights an important and previously overlooked aspect of continual learning: given a finite modelling capacity and an arbitrarily long learning horizon, efficient learning requires memorizing class-specific information and accumulating knowledge about general mechanisms. In a simple setting with direct supervision on the generative factors, we show how learning class-agnostic transformations offers a way to circumvent catastrophic forgetting and improve classification accuracy over time. Our approach sets the stage for continual learning over hundreds of tasks with explicit control over memorization and forgetting, emphasizing open-set classification and one-shot generalization.

Infinite dSprites for Disentangled Continual Learning: Separating Memory Edits from Generalization

TL;DR

Abstract

Paper Structure (34 sections, 2 equations, 12 figures, 1 table, 2 algorithms)

This paper contains 34 sections, 2 equations, 12 figures, 1 table, 2 algorithms.

Introduction
Contributions
Motivation: Three issues with class-incremental continual learning
Benchmarking
Invariant representations
Pre-trained models
Methods
Infinite dSprites
Disentangled continual learning
Implementation and training objective
Discussion
Related work
Continual learning
Benchmarking continual learning
Experiments
...and 19 more sections

Figures (12)

Figure 1: Schematic illustration of the conceptual dcl framework with two modules transferred across tasks: 1) the canonicalization network, consisting of an equivariant module that estimates the parameters of a transformation and a normalization module that applies this transformation to the input, and (2) a buffer that stores class-specific exemplars. The canonicalization network is trained continually to map each image to its exemplar. At test time, we return the label of the exemplar closest to the normalized input.
Figure 1: Average test accuracy on all past tasks for dcl and agem.
Figure 2: A batch of images from the Infinite dSprites dataset containing four distinct shapes. Each shape is shown in all combinations of four factors of variation (horizontal and vertical position, orientation, and scale) with two possible values per factor.
Figure 3: Average test accuracy on all past tasks for dcl and standard regularization methods: lwf, si, and ewc.
Figure 3: Cumulative test accuracy of dcl in the online scenario (every image is seen only once) and two offline scenarios (3 and 5 training epochs per task).
...and 7 more figures

Infinite dSprites for Disentangled Continual Learning: Separating Memory Edits from Generalization

TL;DR

Abstract

Infinite dSprites for Disentangled Continual Learning: Separating Memory Edits from Generalization

Authors

TL;DR

Abstract

Table of Contents

Figures (12)