Table of Contents
Fetching ...

Class-Independent Increment: An Efficient Approach for Multi-label Class-Incremental Learning

Chenhao Ding, Songlin Dong, Zhengdong Zhou, Jizhou Han, Qiang Wang, Yuhang He, Yihong Gong

TL;DR

The paper tackles multi-label class-incremental learning (MLCIL) by addressing inter-session and intra-feature confusion through a class-independent incremental network (CINet) that produces per-class embeddings via class-specific tokens and a class-level cross-attention mechanism. It introduces two losses, including a multi-label contrastive loss, to separate old and new concepts while preserving learned knowledge, and demonstrates strong performance on MS-COCO and PASCAL VOC, including challenging settings and buffer-free scenarios. The key contributions are the CINet architecture, a scalable token-based representation for multi-labels, and a dedicated loss framework that improves forgetting resistance and recognition accuracy across incremental sessions. The results indicate practical impact for real-world multi-label applications, offering a robust approach to continual learning with reduced memory footprints and improved resilience to feature confusion.

Abstract

Current research on class-incremental learning primarily focuses on single-label classification tasks. However, real-world applications often involve multi-label scenarios, such as image retrieval and medical imaging. Therefore, this paper focuses on the challenging yet practical multi-label class-incremental learning (MLCIL) problem. In addition to the challenge of catastrophic forgetting, MLCIL encounters issues related to feature confusion, encompassing inter-session and intra-feature confusion. To address these problems, we propose a novel MLCIL approach called class-independent increment (CLIN). Specifically, in contrast to existing methods that extract image-level features, we propose a class-independent incremental network (CINet) to extract multiple class-level embeddings for multi-label samples. It learns and preserves the knowledge of different classes by constructing class-specific tokens. On this basis, we develop two novel loss functions, optimizing the learning of class-specific tokens and class-level embeddings, respectively. These losses aim to distinguish between new and old classes, further alleviating the problem of feature confusion. Extensive experiments on MS-COCO and PASCAL VOC datasets demonstrate the effectiveness of our method for improving recognition performance and mitigating forgetting on various MLCIL tasks.

Class-Independent Increment: An Efficient Approach for Multi-label Class-Incremental Learning

TL;DR

The paper tackles multi-label class-incremental learning (MLCIL) by addressing inter-session and intra-feature confusion through a class-independent incremental network (CINet) that produces per-class embeddings via class-specific tokens and a class-level cross-attention mechanism. It introduces two losses, including a multi-label contrastive loss, to separate old and new concepts while preserving learned knowledge, and demonstrates strong performance on MS-COCO and PASCAL VOC, including challenging settings and buffer-free scenarios. The key contributions are the CINet architecture, a scalable token-based representation for multi-labels, and a dedicated loss framework that improves forgetting resistance and recognition accuracy across incremental sessions. The results indicate practical impact for real-world multi-label applications, offering a robust approach to continual learning with reduced memory footprints and improved resilience to feature confusion.

Abstract

Current research on class-incremental learning primarily focuses on single-label classification tasks. However, real-world applications often involve multi-label scenarios, such as image retrieval and medical imaging. Therefore, this paper focuses on the challenging yet practical multi-label class-incremental learning (MLCIL) problem. In addition to the challenge of catastrophic forgetting, MLCIL encounters issues related to feature confusion, encompassing inter-session and intra-feature confusion. To address these problems, we propose a novel MLCIL approach called class-independent increment (CLIN). Specifically, in contrast to existing methods that extract image-level features, we propose a class-independent incremental network (CINet) to extract multiple class-level embeddings for multi-label samples. It learns and preserves the knowledge of different classes by constructing class-specific tokens. On this basis, we develop two novel loss functions, optimizing the learning of class-specific tokens and class-level embeddings, respectively. These losses aim to distinguish between new and old classes, further alleviating the problem of feature confusion. Extensive experiments on MS-COCO and PASCAL VOC datasets demonstrate the effectiveness of our method for improving recognition performance and mitigating forgetting on various MLCIL tasks.

Paper Structure

This paper contains 16 sections, 8 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Comparison between prior work and our work (a) The prior paradigm for MLCIL. Prior methods employ image-level (task-level) features and train a joint classifier for each session. (b) Our proposed paradigm. We employ a CLIN framework to generate and process class-level features to solve the feature confusion problem. Moreover, we design two loss functions to prevent confusion between old and new classes.
  • Figure 2: The framework of proposed CLIN. During training, the image $x^t$ (In actuality, an input image can contain either one or multiple new classes, and it can also contain one, multiple, or no old classes.) undergoes feature extraction by the backbone and is then converted into patch tokens $x_P$. Together with a set of class-specific tokens, this forms the input for the class-level cross-attention module to generate class-level embeddings $E^t$. The set of class-specific tokens includes trainable tokens $\{\boldsymbol{q}_{c}\}_{c=1}^{C_n}$ for new classes and frozen tokens $\{\boldsymbol{q}_{c}\}_{c=1}^{C_o}$ for old classes. Subsequently, $E^t$ are separately fed into the class-independent classifier to compute $\mathcal{L}_{ce}$ and a projection layer to calculate $\mathcal{L}_{mc}$.
  • Figure 3: Detailed illustration of the Multi-label Contrastive Loss ($\mathcal{L}_{mc}$). $\mathcal{L}_{mc}$ is enforced on positive instances to enhance feature discrimination across new and old concepts.
  • Figure 4: Comparison of the average and last accuracy with different settings of parameters $\alpha$ and $\beta$.
  • Figure 5: Comparison of t-SNE visualizations between other methods and our approach, where each color represents a category.