Balancing the Causal Effects in Class-Incremental Learning

Junhao Zheng; Ruiyan Wang; Chongzhi Zhang; Huawen Feng; Qianli Ma

Balancing the Causal Effects in Class-Incremental Learning

Junhao Zheng, Ruiyan Wang, Chongzhi Zhang, Huawen Feng, Qianli Ma

TL;DR

The paper addresses forgetting in class-incremental learning (CIL) with pretrained transformers by diagnosing a causal imbalance between new and old data. It introduces Balancing the Causal Effects (BaCE), which defines two objectives, Effect_old and Effect_new, to promote positive, balanced causal paths from both $X^{old}$ and $X^{new}$ to predictions across old and new classes, using a teacher–student framework and neighbor-weighted scoring. Empirical results across vision and NLP tasks show BaCE outperforms strong baselines, with ablations validating the necessity of both causal components and robust performance across challenging datasets. The work provides a causal-informed framework for continual learning with PTMs, improving knowledge retention while acquiring new concepts, albeit with higher training costs and sensitivity to buffer size.

Abstract

Class-Incremental Learning (CIL) is a practical and challenging problem for achieving general artificial intelligence. Recently, Pre-Trained Models (PTMs) have led to breakthroughs in both visual and natural language processing tasks. Despite recent studies showing PTMs' potential ability to learn sequentially, a plethora of work indicates the necessity of alleviating the catastrophic forgetting of PTMs. Through a pilot study and a causal analysis of CIL, we reveal that the crux lies in the imbalanced causal effects between new and old data. Specifically, the new data encourage models to adapt to new classes while hindering the adaptation of old classes. Similarly, the old data encourages models to adapt to old classes while hindering the adaptation of new classes. In other words, the adaptation process between new and old classes conflicts from the causal perspective. To alleviate this problem, we propose Balancing the Causal Effects (BaCE) in CIL. Concretely, BaCE proposes two objectives for building causal paths from both new and old data to the prediction of new and classes, respectively. In this way, the model is encouraged to adapt to all classes with causal effects from both new and old data and thus alleviates the causal imbalance problem. We conduct extensive experiments on continual image classification, continual text classification, and continual named entity recognition. Empirical results show that BaCE outperforms a series of CIL methods on different tasks and settings.

Balancing the Causal Effects in Class-Incremental Learning

TL;DR

and

to predictions across old and new classes, using a teacher–student framework and neighbor-weighted scoring. Empirical results across vision and NLP tasks show BaCE outperforms strong baselines, with ablations validating the necessity of both causal components and robust performance across challenging datasets. The work provides a causal-informed framework for continual learning with PTMs, improving knowledge retention while acquiring new concepts, albeit with higher training costs and sensitivity to buffer size.

Abstract

Paper Structure (42 sections, 16 equations, 13 figures, 9 tables, 1 algorithm)

This paper contains 42 sections, 16 equations, 13 figures, 9 tables, 1 algorithm.

Introduction
Related Work
Class Incremental Learning
Incremental Learning with PTMs
Class imbalance problem
Causal Inference in CV and NLP
Continual Causal Discovery
A Pilot Study for CIL with PTMs
Probing Study
Tracking Study
Methodology
Revisiting the Causalities in CIL
BaCE: Balancing the Causal Effects in CIL
$\textit{Effect}_{\textbf{old}}$: Learning Old Classes with Balanced Causal Effects from $X^{old}$ and $X^{new}$.
$\textit{Effect}_{\textbf{new}}$: Learning New Classes with Balanced Causal Effects from $X^{new}$ and $X^{old}$.
...and 27 more sections

Figures (13)

Figure 1: The probing study on the 20-step split CIFAR-100. We use ViT-B/16 (ViT) pretrained on ImageNet-21k dosovitskiy2020image as backbones. The buffer size is 200 in REPLAY. The blue curve represents the observed accuracy and the red curve represents the probing accuracy.
Figure 2: The evolution of feature-embedding distance. The backbone model is ViT-B/16, and the dataset is the 20-step split CIFAR-100. Each colour represents the average feature-embedding distance of classes from an incremental task.
Figure 3: (a) and (b) show the relationship between feature-embedding distance and average accuracy and forgetting. (c) shows the feature-embedding distance of old and new tasks. "Average" represents the distance averaged over all incremental steps; "Last" represents the distance measured at the last incremental step; "New", "Old", and "All" represent the distance of new, old, and all classes, respectively.
Figure 4: The illustration of the conflicting causal effects in CIL.
Figure 5: The causal graphs of SEQ, REPLAY, and BaCE (Ours) with $\textit{Effect}_{old}$ and $\textit{Effect}_{new}$ in each CIL step. The directed edges represent the causal effects between variables. The red and green paths represent the positive causal effects of adaptation of old and new classes, respectively. The black paths represent the conflicting causal effects.
...and 8 more figures

Balancing the Causal Effects in Class-Incremental Learning

TL;DR

Abstract

Balancing the Causal Effects in Class-Incremental Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (13)