Table of Contents
Fetching ...

Boomda: Balanced Multi-objective Optimization for Multimodal Domain Adaptation

Jun Sun, Xinxin Zhang, Simin Hong, Jian Zhu, Xiang Gao

TL;DR

Boomda tackles heterogeneous multimodal domain adaptation by learning independent modality representations through information bottleneck and aligning domains with correlation alignment. It casts modality balancing as a Pareto-optimal multi-objective optimization solved efficiently via MGDA, and further simplifies to a closed-form weighting using a diagonal Q approximation. Empirical results on IEMOCAP and MSP-IMPROV show consistent improvements over strong baselines, with ablations confirming the benefits of balanced correlation alignment and pseudo labeling. The approach combines theoretical guarantees with practical efficiency, enabling scalable, modality-balanced multimodal domain adaptation with reproducible code.

Abstract

Multimodal learning, while contributing to numerous success stories across various fields, faces the challenge of prohibitively expensive manual annotation. To address the scarcity of annotated data, a popular solution is unsupervised domain adaptation, which has been extensively studied in unimodal settings yet remains less explored in multimodal settings. In this paper, we investigate heterogeneous multimodal domain adaptation, where the primary challenge is the varying domain shifts of different modalities from the source to the target domain. We first introduce the information bottleneck method to learn representations for each modality independently, and then match the source and target domains in the representation space with correlation alignment. To balance the domain alignment of all modalities, we formulate the problem as a multi-objective task, aiming for a Pareto optimal solution. By exploiting the properties specific to our model, the problem can be simplified to a quadratic programming problem. Further approximation yields a closed-form solution, leading to an efficient modality-balanced multimodal domain adaptation algorithm. The proposed method features \textbf{B}alanced multi-\textbf{o}bjective \textbf{o}ptimization for \textbf{m}ultimodal \textbf{d}omain \textbf{a}daptation, termed \textbf{Boomda}. Extensive empirical results showcase the effectiveness of the proposed approach and demonstrate that Boomda outperforms the competing schemes. The code is is available at: https://github.com/sunjunaimer/Boomda.git.

Boomda: Balanced Multi-objective Optimization for Multimodal Domain Adaptation

TL;DR

Boomda tackles heterogeneous multimodal domain adaptation by learning independent modality representations through information bottleneck and aligning domains with correlation alignment. It casts modality balancing as a Pareto-optimal multi-objective optimization solved efficiently via MGDA, and further simplifies to a closed-form weighting using a diagonal Q approximation. Empirical results on IEMOCAP and MSP-IMPROV show consistent improvements over strong baselines, with ablations confirming the benefits of balanced correlation alignment and pseudo labeling. The approach combines theoretical guarantees with practical efficiency, enabling scalable, modality-balanced multimodal domain adaptation with reproducible code.

Abstract

Multimodal learning, while contributing to numerous success stories across various fields, faces the challenge of prohibitively expensive manual annotation. To address the scarcity of annotated data, a popular solution is unsupervised domain adaptation, which has been extensively studied in unimodal settings yet remains less explored in multimodal settings. In this paper, we investigate heterogeneous multimodal domain adaptation, where the primary challenge is the varying domain shifts of different modalities from the source to the target domain. We first introduce the information bottleneck method to learn representations for each modality independently, and then match the source and target domains in the representation space with correlation alignment. To balance the domain alignment of all modalities, we formulate the problem as a multi-objective task, aiming for a Pareto optimal solution. By exploiting the properties specific to our model, the problem can be simplified to a quadratic programming problem. Further approximation yields a closed-form solution, leading to an efficient modality-balanced multimodal domain adaptation algorithm. The proposed method features \textbf{B}alanced multi-\textbf{o}bjective \textbf{o}ptimization for \textbf{m}ultimodal \textbf{d}omain \textbf{a}daptation, termed \textbf{Boomda}. Extensive empirical results showcase the effectiveness of the proposed approach and demonstrate that Boomda outperforms the competing schemes. The code is is available at: https://github.com/sunjunaimer/Boomda.git.

Paper Structure

This paper contains 17 sections, 1 theorem, 31 equations, 4 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

Problem P4 admits a closed-form solution:

Figures (4)

  • Figure 1: Domain shift in the context of multimodal emotion recognition. The example sample is drawn from dataset IEMOCAP busso2008iemocap.
  • Figure 2: Model framework with 2 modalities as an example (multimodal representation $\bm{Z}_3$ is a concatenation of $\bm{Z}_1$ and $\bm{Z}_2$; solid and dashed regular arrows represent the flows of source and target domains, respectively; double-headed arrows represent alignment or supervision signals, corresponding to the information bottleneck loss $\mathcal{L}^{I \!B}(\bm{\theta})$, pseudo label supervision loss $\mathcal{L}^{P\!L}(\bm{\theta})$ and correlation alignment loss $\mathcal{L}^{C\!A}(\bm{\theta})$).
  • Figure 3: The training dynamics on the IEMOCAP dataset.
  • Figure 4: The pseudo labeling during training ((a) and (b) are the results on the IEMOCAP dataset; (c) and (d) are the results on the MSP-IMPROV dataset).

Theorems & Definitions (3)

  • Definition 1
  • Theorem 1
  • proof