Table of Contents
Fetching ...

Boosting Open-Domain Continual Learning via Leveraging Intra-domain Category-aware Prototype

Yadong Lu, Shitian Zhao, Boxiang Yun, Dongsheng Jiang, Yin Li, Qingli Li, Yan Wang

TL;DR

This paper proposes a training-free Task-ID discriminator method, by utilizing prototypes as classifiers for identifying Task-IDs, and incorporates intra-domain category-aware prototypes as domain prior prompts into the training process.

Abstract

Despite recent progress in enhancing the efficacy of Open-Domain Continual Learning (ODCL) in Vision-Language Models (VLM), failing to (1) correctly identify the Task-ID of a test image and (2) use only the category set corresponding to the Task-ID, while preserving the knowledge related to each domain, cannot address the two primary challenges of ODCL: forgetting old knowledge and maintaining zero-shot capabilities, as well as the confusions caused by category-relatedness between domains. In this paper, we propose a simple yet effective solution: leveraging intra-domain category-aware prototypes for ODCL in CLIP (DPeCLIP), where the prototype is the key to bridging the above two processes. Concretely, we propose a training-free Task-ID discriminator method, by utilizing prototypes as classifiers for identifying Task-IDs. Furthermore, to maintain the knowledge corresponding to each domain, we incorporate intra-domain category-aware prototypes as domain prior prompts into the training process. Extensive experiments conducted on 11 different datasets demonstrate the effectiveness of our approach, achieving 2.37% and 1.14% average improvement in class-incremental and task-incremental settings, respectively.

Boosting Open-Domain Continual Learning via Leveraging Intra-domain Category-aware Prototype

TL;DR

This paper proposes a training-free Task-ID discriminator method, by utilizing prototypes as classifiers for identifying Task-IDs, and incorporates intra-domain category-aware prototypes as domain prior prompts into the training process.

Abstract

Despite recent progress in enhancing the efficacy of Open-Domain Continual Learning (ODCL) in Vision-Language Models (VLM), failing to (1) correctly identify the Task-ID of a test image and (2) use only the category set corresponding to the Task-ID, while preserving the knowledge related to each domain, cannot address the two primary challenges of ODCL: forgetting old knowledge and maintaining zero-shot capabilities, as well as the confusions caused by category-relatedness between domains. In this paper, we propose a simple yet effective solution: leveraging intra-domain category-aware prototypes for ODCL in CLIP (DPeCLIP), where the prototype is the key to bridging the above two processes. Concretely, we propose a training-free Task-ID discriminator method, by utilizing prototypes as classifiers for identifying Task-IDs. Furthermore, to maintain the knowledge corresponding to each domain, we incorporate intra-domain category-aware prototypes as domain prior prompts into the training process. Extensive experiments conducted on 11 different datasets demonstrate the effectiveness of our approach, achieving 2.37% and 1.14% average improvement in class-incremental and task-incremental settings, respectively.
Paper Structure (35 sections, 5 equations, 4 figures, 8 tables)

This paper contains 35 sections, 5 equations, 4 figures, 8 tables.

Figures (4)

  • Figure 1: (a) Comparisons of two approaches for solving the ODCL-CIL task: The first uses all seen categories for classification, while the second selects the corresponding categories for classification through a Task-ID discriminator. (b) Comparisons of Task-ID classification accuracy between MoE-Adapters and ours. (c) Comparisons of our and other methods in ODCL-CIL task. The ODCL task requires evaluating both seen and unseen datasets. The black dashed vertical line indicates the test dataset has not been trained on, and it is assessed through the model's zero-shot capability.
  • Figure 2: Domain Prototype enhanced CLIP (DPeCLIP) framework, consisting of three stages: (a) Prototype Calculation, where category-aware prototypes are extracted using the original CLIP. (b) Training, where we propose Text Self-Attention (TSA) and Image Cross-Attention (ICA) to provide domain prior prompts, with prototypes as input. (c) Inference, where we use the prototypes to determine Task-IDs for test images and employ the corresponding domain components for classification.
  • Figure 3: (a) represents the influence of prompt length and (b) represents the influence of prompt replacement depth.
  • Figure 4: Comparison of t-SNE between MoE-Adapters and our method for the Task-ID discriminator.