Balancing the Causal Effects in Class-Incremental Learning
Junhao Zheng, Ruiyan Wang, Chongzhi Zhang, Huawen Feng, Qianli Ma
TL;DR
The paper addresses forgetting in class-incremental learning (CIL) with pretrained transformers by diagnosing a causal imbalance between new and old data. It introduces Balancing the Causal Effects (BaCE), which defines two objectives, Effect_old and Effect_new, to promote positive, balanced causal paths from both $X^{old}$ and $X^{new}$ to predictions across old and new classes, using a teacher–student framework and neighbor-weighted scoring. Empirical results across vision and NLP tasks show BaCE outperforms strong baselines, with ablations validating the necessity of both causal components and robust performance across challenging datasets. The work provides a causal-informed framework for continual learning with PTMs, improving knowledge retention while acquiring new concepts, albeit with higher training costs and sensitivity to buffer size.
Abstract
Class-Incremental Learning (CIL) is a practical and challenging problem for achieving general artificial intelligence. Recently, Pre-Trained Models (PTMs) have led to breakthroughs in both visual and natural language processing tasks. Despite recent studies showing PTMs' potential ability to learn sequentially, a plethora of work indicates the necessity of alleviating the catastrophic forgetting of PTMs. Through a pilot study and a causal analysis of CIL, we reveal that the crux lies in the imbalanced causal effects between new and old data. Specifically, the new data encourage models to adapt to new classes while hindering the adaptation of old classes. Similarly, the old data encourages models to adapt to old classes while hindering the adaptation of new classes. In other words, the adaptation process between new and old classes conflicts from the causal perspective. To alleviate this problem, we propose Balancing the Causal Effects (BaCE) in CIL. Concretely, BaCE proposes two objectives for building causal paths from both new and old data to the prediction of new and classes, respectively. In this way, the model is encouraged to adapt to all classes with causal effects from both new and old data and thus alleviates the causal imbalance problem. We conduct extensive experiments on continual image classification, continual text classification, and continual named entity recognition. Empirical results show that BaCE outperforms a series of CIL methods on different tasks and settings.
