Table of Contents
Fetching ...

EPE-P: Evidence-based Parameter-efficient Prompting for Multimodal Learning with Missing Modalities

Zhe Chen, Xun Lin, Yawen Cui, Zitong Yu

TL;DR

This work tackles missing modalities in multimodal learning by introducing Evidence-based Parameter-Efficient Prompting (EPE-P), a compact prompting framework that uses a single comprehensive prompt plus modality-specific weight matrices. The approach leverages a Block-wise Kronecker-like Multiplication to tailor prompts for various missing-case inputs and integrates prompts into early transformer layers, complemented by an evidential deep learning loss to capture uncertainty and improve decision-making. Key contributions include the BK-M based prompt design with low-rank factorization, an evidence-based loss with a KL-regularizer, and extensive experiments showing improved robustness and efficiency on MM-IMDb and Hateful Memes. The results demonstrate that EPE-P reduces parameter redundancy while achieving superior performance compared to prior prompting methods, making it practical for real-world multimodal systems with incomplete data.

Abstract

Missing modalities are a common challenge in real-world multimodal learning scenarios, occurring during both training and testing. Existing methods for managing missing modalities often require the design of separate prompts for each modality or missing case, leading to complex designs and a substantial increase in the number of parameters to be learned. As the number of modalities grows, these methods become increasingly inefficient due to parameter redundancy. To address these issues, we propose Evidence-based Parameter-Efficient Prompting (EPE-P), a novel and parameter-efficient method for pretrained multimodal networks. Our approach introduces a streamlined design that integrates prompting information across different modalities, reducing complexity and mitigating redundant parameters. Furthermore, we propose an Evidence-based Loss function to better handle the uncertainty associated with missing modalities, improving the model's decision-making. Our experiments demonstrate that EPE-P outperforms existing prompting-based methods in terms of both effectiveness and efficiency. The code is released at https://github.com/Boris-Jobs/EPE-P_MLLMs-Robustness.

EPE-P: Evidence-based Parameter-efficient Prompting for Multimodal Learning with Missing Modalities

TL;DR

This work tackles missing modalities in multimodal learning by introducing Evidence-based Parameter-Efficient Prompting (EPE-P), a compact prompting framework that uses a single comprehensive prompt plus modality-specific weight matrices. The approach leverages a Block-wise Kronecker-like Multiplication to tailor prompts for various missing-case inputs and integrates prompts into early transformer layers, complemented by an evidential deep learning loss to capture uncertainty and improve decision-making. Key contributions include the BK-M based prompt design with low-rank factorization, an evidence-based loss with a KL-regularizer, and extensive experiments showing improved robustness and efficiency on MM-IMDb and Hateful Memes. The results demonstrate that EPE-P reduces parameter redundancy while achieving superior performance compared to prior prompting methods, making it practical for real-world multimodal systems with incomplete data.

Abstract

Missing modalities are a common challenge in real-world multimodal learning scenarios, occurring during both training and testing. Existing methods for managing missing modalities often require the design of separate prompts for each modality or missing case, leading to complex designs and a substantial increase in the number of parameters to be learned. As the number of modalities grows, these methods become increasingly inefficient due to parameter redundancy. To address these issues, we propose Evidence-based Parameter-Efficient Prompting (EPE-P), a novel and parameter-efficient method for pretrained multimodal networks. Our approach introduces a streamlined design that integrates prompting information across different modalities, reducing complexity and mitigating redundant parameters. Furthermore, we propose an Evidence-based Loss function to better handle the uncertainty associated with missing modalities, improving the model's decision-making. Our experiments demonstrate that EPE-P outperforms existing prompting-based methods in terms of both effectiveness and efficiency. The code is released at https://github.com/Boris-Jobs/EPE-P_MLLMs-Robustness.

Paper Structure

This paper contains 11 sections, 8 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The figure illustrates the differences between our approach and existing methods. For a scenario with $m$ modalities, MAP lee2023multimodalpromptingmissingmodalities requires designing prompts for each missing case, resulting in a total of $2^m-1$ prompts. MSP 10447257, on the other hand, requires one prompt per modality, totaling $m$ prompts. Our method only needs a single comprehensive prompt along with $m$ prompt weight matrices, each significantly smaller than a full prompt. This highlights the efficiency of our proposed EPE-P approach.
  • Figure 2: Overview of our proposed EPE-P approach for multimodal learning with missing modalities.
  • Figure 3: Quantitative results of proposed EPE-P on the Hateful Memes kiela2020hateful dataset with varying missing rates. The evaluation was conducted on a test set with a 25% missing rate for both text and images.