Table of Contents
Fetching ...

MIDG: Mixture of Invariant Experts with knowledge injection for Domain Generalization in Multimodal Sentiment Analysis

Yangle Li, Danli Luo, Haifeng Hu

TL;DR

This work tackles domain generalization in Multimodal Sentiment Analysis by addressing two gaps: insufficient inter-modal synergy in invariant feature extraction and fragmented cross-modal knowledge injection. It proposes MIDG, a framework that combines a Mixture of Invariant Experts (MoIE) to learn domain-invariant multimodal features with a Cross-Modal Adapter to inject cross-modal knowledge into unimodal representations, trained using both in-domain data and simulated out-of-domain data created via an information entropy disentanglement module. The approach is validated on MOSI, MOSEI, and CH-SIMS, showing superior or competitive performance against state-of-the-art baselines and demonstrating strong generalization to unseen domains. The results highlight the importance of jointly modeling cross-modal interactions and cross-modal knowledge injection for robust, domain-general MSA.

Abstract

Existing methods in domain generalization for Multimodal Sentiment Analysis (MSA) often overlook inter-modal synergies during invariant features extraction, which prevents the accurate capture of the rich semantic information within multimodal data. Additionally, while knowledge injection techniques have been explored in MSA, they often suffer from fragmented cross-modal knowledge, overlooking specific representations that exist beyond the confines of unimodal. To address these limitations, we propose a novel MSA framework designed for domain generalization. Firstly, the framework incorporates a Mixture of Invariant Experts model to extract domain-invariant features, thereby enhancing the model's capacity to learn synergistic relationships between modalities. Secondly, we design a Cross-Modal Adapter to augment the semantic richness of multimodal representations through cross-modal knowledge injection. Extensive domain experiments conducted on three datasets demonstrate that the proposed MIDG achieves superior performance.

MIDG: Mixture of Invariant Experts with knowledge injection for Domain Generalization in Multimodal Sentiment Analysis

TL;DR

This work tackles domain generalization in Multimodal Sentiment Analysis by addressing two gaps: insufficient inter-modal synergy in invariant feature extraction and fragmented cross-modal knowledge injection. It proposes MIDG, a framework that combines a Mixture of Invariant Experts (MoIE) to learn domain-invariant multimodal features with a Cross-Modal Adapter to inject cross-modal knowledge into unimodal representations, trained using both in-domain data and simulated out-of-domain data created via an information entropy disentanglement module. The approach is validated on MOSI, MOSEI, and CH-SIMS, showing superior or competitive performance against state-of-the-art baselines and demonstrating strong generalization to unseen domains. The results highlight the importance of jointly modeling cross-modal interactions and cross-modal knowledge injection for robust, domain-general MSA.

Abstract

Existing methods in domain generalization for Multimodal Sentiment Analysis (MSA) often overlook inter-modal synergies during invariant features extraction, which prevents the accurate capture of the rich semantic information within multimodal data. Additionally, while knowledge injection techniques have been explored in MSA, they often suffer from fragmented cross-modal knowledge, overlooking specific representations that exist beyond the confines of unimodal. To address these limitations, we propose a novel MSA framework designed for domain generalization. Firstly, the framework incorporates a Mixture of Invariant Experts model to extract domain-invariant features, thereby enhancing the model's capacity to learn synergistic relationships between modalities. Secondly, we design a Cross-Modal Adapter to augment the semantic richness of multimodal representations through cross-modal knowledge injection. Extensive domain experiments conducted on three datasets demonstrate that the proposed MIDG achieves superior performance.

Paper Structure

This paper contains 15 sections, 6 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Framework of MIDG. (a):Data preparation pipeline, using the entropy decoupling module. (b): Pipeline of MSA task, divided into in-domain data flows and out-of-domain data flows. The final output of the model is obtained by performing a weighted sum of the results from the two processes.