Multimodal Mixture of Low-Rank Experts for Sentiment Analysis and Emotion Recognition
Shuo Zhang, Jinsong Zhang, Zhejun Zhang, Lei Li
TL;DR
This work tackles the joint problem of multimodal sentiment analysis and emotion recognition by addressing parameter conflicts that arise from naive parameter sharing. It introduces Multimodal Mixture of Low-Rank Experts (MMoLRE), which uses shared and task-specific low-rank experts plus a UniTSE module and task-adaptive fusion to model commonalities and differences between tasks while keeping computational costs low. The approach achieves state-of-the-art results on MSA benchmarks (CMU-MOSI and CMU-MOSEI) and competitive performance on MER, supported by extensive ablations and parameter studies. Overall, MMoLRE demonstrates that explicit task separation with low-rank MoE can enhance multi-task learning for multimodal affective computing with substantial parameter savings and scalable capacity.
Abstract
Multi-task learning (MTL) enables the efficient transfer of extra knowledge acquired from other tasks. The high correlation between multimodal sentiment analysis (MSA) and multimodal emotion recognition (MER) supports their joint training. However, existing methods primarily employ hard parameter sharing, ignoring parameter conflicts caused by complex task correlations. In this paper, we present a novel MTL method for MSA and MER, termed Multimodal Mixture of Low-Rank Experts (MMoLRE). MMoLRE utilizes shared and task-specific experts to distinctly model common and unique task characteristics, thereby avoiding parameter conflicts. Additionally, inspired by low-rank structures in the Mixture of Experts (MoE) framework, we design low-rank expert networks to reduce parameter and computational overhead as the number of experts increases. Extensive experiments on the CMU-MOSI and CMU-MOSEI benchmarks demonstrate that MMoLRE achieves state-of-the-art performance on the MSA task and competitive results on the MER task.
