Table of Contents
Fetching ...

Shared and Private Information Learning in Multimodal Sentiment Analysis with Deep Modal Alignment and Self-supervised Multi-Task Learning

Songning Lai, Jiakang Li, Guinan Guo, Xifeng Hu, Yulong Li, Yuan Tan, Zichen Song, Yutong Liu, Zhaoxia Ren, Chun Wan, Danmin Miao, Zhi Liu

TL;DR

This work proposes a novel deep modal shared information learning module that utilizes the covariance matrix to capture shared information across modalities and introduces a label generation module based on a self-supervised learning strategy to capture the private information specific to each modality.

Abstract

Designing an effective representation learning method for multimodal sentiment analysis tasks is a crucial research direction. The challenge lies in learning both shared and private information in a complete modal representation, which is difficult with uniform multimodal labels and a raw feature fusion approach. In this work, we propose a deep modal shared information learning module based on the covariance matrix to capture the shared information between modalities. Additionally, we use a label generation module based on a self-supervised learning strategy to capture the private information of the modalities. Our module is plug-and-play in multimodal tasks, and by changing the parameterization, it can adjust the information exchange relationship between the modes and learn the private or shared information between the specified modes. We also employ a multi-task learning strategy to help the model focus its attention on the modal differentiation training data. We provide a detailed formulation derivation and feasibility proof for the design of the deep modal shared information learning module. We conduct extensive experiments on three common multimodal sentiment analysis baseline datasets, and the experimental results validate the reliability of our model. Furthermore, we explore more combinatorial techniques for the use of the module. Our approach outperforms current state-of-the-art methods on most of the metrics of the three public datasets.

Shared and Private Information Learning in Multimodal Sentiment Analysis with Deep Modal Alignment and Self-supervised Multi-Task Learning

TL;DR

This work proposes a novel deep modal shared information learning module that utilizes the covariance matrix to capture shared information across modalities and introduces a label generation module based on a self-supervised learning strategy to capture the private information specific to each modality.

Abstract

Designing an effective representation learning method for multimodal sentiment analysis tasks is a crucial research direction. The challenge lies in learning both shared and private information in a complete modal representation, which is difficult with uniform multimodal labels and a raw feature fusion approach. In this work, we propose a deep modal shared information learning module based on the covariance matrix to capture the shared information between modalities. Additionally, we use a label generation module based on a self-supervised learning strategy to capture the private information of the modalities. Our module is plug-and-play in multimodal tasks, and by changing the parameterization, it can adjust the information exchange relationship between the modes and learn the private or shared information between the specified modes. We also employ a multi-task learning strategy to help the model focus its attention on the modal differentiation training data. We provide a detailed formulation derivation and feasibility proof for the design of the deep modal shared information learning module. We conduct extensive experiments on three common multimodal sentiment analysis baseline datasets, and the experimental results validate the reliability of our model. Furthermore, we explore more combinatorial techniques for the use of the module. Our approach outperforms current state-of-the-art methods on most of the metrics of the three public datasets.
Paper Structure (16 sections, 35 equations, 3 figures, 3 tables)

This paper contains 16 sections, 35 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Explanation diagram of multimodal sentiment analysis task.
  • Figure 2: The overall model architecture is presented in the following flowchart, encompassing various components. These components consist of feature extraction modules for each modality, a self-supervised unimodal label generation module, a deep modal shared information learning module, and a multimodal sentiment analysis output module. Together, these modules enable the model to extract relevant features from each modality, generate labels for individual modalities using self-supervised learning, capture shared information across modalities through the deep modal shared information learning module, and produce sentiment analysis predictions based on the multimodal input. The flowchart provides a clear visualization of the complete model architecture and the interconnectedness of its key modules.
  • Figure 3: Flowchart of the complete model architecture.