Continual Learning in the Presence of Repetition

Hamed Hemati; Lorenzo Pellegrini; Xiaotian Duan; Zixuan Zhao; Fangfang Xia; Marc Masana; Benedikt Tscheschner; Eduardo Veas; Yuxiang Zheng; Shiji Zhao; Shao-Yuan Li; Sheng-Jun Huang; Vincenzo Lomonaco; Gido M. van de Ven

Continual Learning in the Presence of Repetition

Hamed Hemati, Lorenzo Pellegrini, Xiaotian Duan, Zixuan Zhao, Fangfang Xia, Marc Masana, Benedikt Tscheschner, Eduardo Veas, Yuxiang Zheng, Shiji Zhao, Shao-Yuan Li, Sheng-Jun Huang, Vincenzo Lomonaco, Gido M. van de Ven

TL;DR

This paper investigates continual learning under repetition (CIR) through the CLVision challenge at CVPR 2023, addressing how repeated exposure of past classes within a stream impacts learning. It introduces a four-parameter data-stream generator to create CIR benchmarks and analyzes three finalist ensemble-based strategies—HAT-CIR, Horde, and DWGRNet—that exploit repetition without storing raw samples. The results show that ensemble approaches substantially outperform baselines in CIR scenarios, with HAT-CIR achieving the best overall performance, while removing repetition reduces their advantage, underscoring the pivotal role of repetition in shaping strategy effectiveness. The findings suggest that natural repetition can act as a form of regime-like bagging, enabling richer representations and robust continual learning, and point to future work on scaling ensembles and exploring larger architectures for CIR tasks.

Abstract

Continual learning (CL) provides a framework for training models in ever-evolving environments. Although re-occurrence of previously seen objects or tasks is common in real-world problems, the concept of repetition in the data stream is not often considered in standard benchmarks for CL. Unlike with the rehearsal mechanism in buffer-based strategies, where sample repetition is controlled by the strategy, repetition in the data stream naturally stems from the environment. This report provides a summary of the CLVision challenge at CVPR 2023, which focused on the topic of repetition in class-incremental learning. The report initially outlines the challenge objective and then describes three solutions proposed by finalist teams that aim to effectively exploit the repetition in the stream to learn continually. The experimental results from the challenge highlight the effectiveness of ensemble-based solutions that employ multiple versions of similar modules, each trained on different but overlapping subsets of classes. This report underscores the transformative potential of taking a different perspective in CL by employing repetition in the data stream to foster innovative strategy design.

Continual Learning in the Presence of Repetition

TL;DR

Abstract

Paper Structure (45 sections, 5 equations, 9 figures, 8 tables, 2 algorithms)

This paper contains 45 sections, 5 equations, 9 figures, 8 tables, 2 algorithms.

Introduction
Challenge Details
Pre-selection Phase
Final Evaluation Phase
Stream Generator
Selected Challenge Stream Parameters
Participation
Rules and Restrictions
Strategy 1: HAT-CIR
Motivation and Related Work
Method Description
Structural Design
HAT-based Partitioning
Network Replicas
Two-phase Training
...and 30 more sections

Figures (9)

Figure 1: Illustration of transitioning from "standard Class-IL" to "cumulative Class-IL" by increasing the probability of repetition for seen classes, which is set by control parameter $P_r$. Class-IL: class-incremental learning.
Figure 2: Examples of generated streams with a Geometric distribution over the first occurrences of the dataset's classes, and a Zipfian distribution for the repetition of classes along the stream.
Figure 3: Schematic of HAT-CIR during training and test time. HAT-CIR is illustrated here with network replicas and without HAT-based partitioning. (a) When training on a new experience, a new 'fragment' -- consisting of multiple 'ensembles' -- is added to the model and trained on the training data of the new experience in two phases. In the first phase, a projection head is used and a supervised contrastive loss is optimized; in the second phase, a softmax output layer is used and a cross-entropy loss is optimized. (b) During testing, a score for each possible class is computed as a weighted average of the logits from the most recent fragments that were trained on an experience in which that class appeared.
Figure 4: The panels indicate for stream 1 (left) and stream 2 (right) in which experiences Horde decided to add a new FE (indicated by bright colors), while comparing the experiences based on three factors: the total number of classes in the experience (blue), the number of classes in the experience on which no FE had been trained yet (yellow), and the number of new, unseen classes in the experience (red). The black line represents the total amount of classes seen so far.
Figure 5: Schematic of Horde during training and test time. (a) When training on a new experience, in the first phase, a new feature extractor might be trained using both a cross-entropy and a contrastive loss. Whether a new feature extractor is trained on the new experience is decided by a heuristic, see Figure \ref{['fig:horde-heuristic']}. (b) In the second training phase, which is performed on each new experience, a pseudo-feature projection is performed and a unified head is trained to discriminate between all seen classes based on the features from the ensemble. (c) At test time, a test sample is simply forwarded through all components of the model, and the predicted class is read out from the unified head.
...and 4 more figures

Continual Learning in the Presence of Repetition

TL;DR

Abstract

Continual Learning in the Presence of Repetition

Authors

TL;DR

Abstract

Table of Contents

Figures (9)