Table of Contents
Fetching ...

PAL: Prompting Analytic Learning with Missing Modality for Multi-Modal Class-Incremental Learning

Xianghu Yue, Yiming Chen, Xueyi Zhang, Xiaoxue Gao, Mengling Feng, Mingrui Lao, Huiping Zhuang, Haizhou Li

TL;DR

PAL tackles MMCIL with missing modalities by integrating modality-specific prompts and a $\text{RLS}$-based analytic learning module. By freezing the backbone and solving a closed-form ridge-regression update, PAL mitigates under-fitting while preserving holistic representations across modality gaps. Empirical results on UPMC-Food101 and N24News show PAL surpasses state-of-the-art baselines in average accuracy and forgetting across varying missing-rate and large-step scenarios. The approach demonstrates robust, exemplar-free incremental learning with scalable performance, and it sets the stage for extending to tri-modal data in the future.

Abstract

Multi-modal class-incremental learning (MMCIL) seeks to leverage multi-modal data, such as audio-visual and image-text pairs, thereby enabling models to learn continuously across a sequence of tasks while mitigating forgetting. While existing studies primarily focus on the integration and utilization of multi-modal information for MMCIL, a critical challenge remains: the issue of missing modalities during incremental learning phases. This oversight can exacerbate severe forgetting and significantly impair model performance. To bridge this gap, we propose PAL, a novel exemplar-free framework tailored to MMCIL under missing-modality scenarios. Concretely, we devise modality-specific prompts to compensate for missing information, facilitating the model to maintain a holistic representation of the data. On this foundation, we reformulate the MMCIL problem into a Recursive Least-Squares task, delivering an analytical linear solution. Building upon these, PAL not only alleviates the inherent under-fitting limitation in analytic learning but also preserves the holistic representation of missing-modality data, achieving superior performance with less forgetting across various multi-modal incremental scenarios. Extensive experiments demonstrate that PAL significantly outperforms competitive methods across various datasets, including UPMC-Food101 and N24News, showcasing its robustness towards modality absence and its anti-forgetting ability to maintain high incremental accuracy.

PAL: Prompting Analytic Learning with Missing Modality for Multi-Modal Class-Incremental Learning

TL;DR

PAL tackles MMCIL with missing modalities by integrating modality-specific prompts and a -based analytic learning module. By freezing the backbone and solving a closed-form ridge-regression update, PAL mitigates under-fitting while preserving holistic representations across modality gaps. Empirical results on UPMC-Food101 and N24News show PAL surpasses state-of-the-art baselines in average accuracy and forgetting across varying missing-rate and large-step scenarios. The approach demonstrates robust, exemplar-free incremental learning with scalable performance, and it sets the stage for extending to tri-modal data in the future.

Abstract

Multi-modal class-incremental learning (MMCIL) seeks to leverage multi-modal data, such as audio-visual and image-text pairs, thereby enabling models to learn continuously across a sequence of tasks while mitigating forgetting. While existing studies primarily focus on the integration and utilization of multi-modal information for MMCIL, a critical challenge remains: the issue of missing modalities during incremental learning phases. This oversight can exacerbate severe forgetting and significantly impair model performance. To bridge this gap, we propose PAL, a novel exemplar-free framework tailored to MMCIL under missing-modality scenarios. Concretely, we devise modality-specific prompts to compensate for missing information, facilitating the model to maintain a holistic representation of the data. On this foundation, we reformulate the MMCIL problem into a Recursive Least-Squares task, delivering an analytical linear solution. Building upon these, PAL not only alleviates the inherent under-fitting limitation in analytic learning but also preserves the holistic representation of missing-modality data, achieving superior performance with less forgetting across various multi-modal incremental scenarios. Extensive experiments demonstrate that PAL significantly outperforms competitive methods across various datasets, including UPMC-Food101 and N24News, showcasing its robustness towards modality absence and its anti-forgetting ability to maintain high incremental accuracy.
Paper Structure (32 sections, 30 equations, 9 figures, 5 tables, 1 algorithm)

This paper contains 32 sections, 30 equations, 9 figures, 5 tables, 1 algorithm.

Figures (9)

  • Figure 1: Illustrate of multi-modal class-incremental learning (MMCIL) with missing modality. Previous works (top) focus on leveraging multi-modal information from modality-complete data to mitigate forgetting. In contrast, our work (bottom) studies a more general scenario where various modality-missing cases would occur differently not only across each data sample but also at various incremental learning phases.
  • Figure 2: The basic backbone of PAL, including a pre-trained multi-modal transformer, two modality-specific prompt pools (i.e., image and text), and a linear classifier.
  • Figure 3: The conceptual illustration of our proposed PAL framework during multi-modal CIL procedures, consisting of two steps of training. The first step trains the classifier via BP, and the second step re-trains the classifier via AL. For simplicity, we omit the modality-specific prompt pool.
  • Figure 4: Testing accuracy at each incremental step on the UPMC-Food101 dataset, when the missing rate $\eta$ is 0.1, 0.5 and 0.7.
  • Figure 5: Testing accuracy at each incremental step on the N24News dataset, when the missing rate $\eta$ is 0.1, 0.5 and 0.7.
  • ...and 4 more figures