Table of Contents
Fetching ...

Classifier-guided Gradient Modulation for Enhanced Multimodal Learning

Zirun Guo, Tao Jin, Jingyuan Chen, Zhou Zhao

TL;DR

A novel method to balance multimodal learning with Classifier-Guided Gradient Modulation (CGGM), considering both the magnitude and directions of the gradients is presented.

Abstract

Multimodal learning has developed very fast in recent years. However, during the multimodal training process, the model tends to rely on only one modality based on which it could learn faster, thus leading to inadequate use of other modalities. Existing methods to balance the training process always have some limitations on the loss functions, optimizers and the number of modalities and only consider modulating the magnitude of the gradients while ignoring the directions of the gradients. To solve these problems, in this paper, we present a novel method to balance multimodal learning with Classifier-Guided Gradient Modulation (CGGM), considering both the magnitude and directions of the gradients. We conduct extensive experiments on four multimodal datasets: UPMC-Food 101, CMU-MOSI, IEMOCAP and BraTS 2021, covering classification, regression and segmentation tasks. The results show that CGGM outperforms all the baselines and other state-of-the-art methods consistently, demonstrating its effectiveness and versatility. Our code is available at https://github.com/zrguo/CGGM.

Classifier-guided Gradient Modulation for Enhanced Multimodal Learning

TL;DR

A novel method to balance multimodal learning with Classifier-Guided Gradient Modulation (CGGM), considering both the magnitude and directions of the gradients is presented.

Abstract

Multimodal learning has developed very fast in recent years. However, during the multimodal training process, the model tends to rely on only one modality based on which it could learn faster, thus leading to inadequate use of other modalities. Existing methods to balance the training process always have some limitations on the loss functions, optimizers and the number of modalities and only consider modulating the magnitude of the gradients while ignoring the directions of the gradients. To solve these problems, in this paper, we present a novel method to balance multimodal learning with Classifier-Guided Gradient Modulation (CGGM), considering both the magnitude and directions of the gradients. We conduct extensive experiments on four multimodal datasets: UPMC-Food 101, CMU-MOSI, IEMOCAP and BraTS 2021, covering classification, regression and segmentation tasks. The results show that CGGM outperforms all the baselines and other state-of-the-art methods consistently, demonstrating its effectiveness and versatility. Our code is available at https://github.com/zrguo/CGGM.
Paper Structure (18 sections, 15 equations, 7 figures, 8 tables, 1 algorithm)

This paper contains 18 sections, 15 equations, 7 figures, 8 tables, 1 algorithm.

Figures (7)

  • Figure 1: The overall architecture of CGGM. During the training stage, classifiers are introduced to calculate the directions of unimodal gradients and evaluation metrics. During the inference stage, the classifiers are discarded.
  • Figure 2: (a) Accuracy of each modality and the fusion. (b) Gradient magnitude of each modality. We use the Euclidean norm of the gradient vector to represent the gradient magnitude. (c) Gradient direction between each modality and their fusion. We use cosine similarity to represent the direction between two gradient vectors. We get all the results on the CMU-MOSI dataset.
  • Figure 3: Changes in (a) performance, (b) gradient magnitude and (c) direction during training with CGGM. We get the results on CMU-MOSI dataset.
  • Figure 4: t-SNE visualization of the gradients of classifiers and the unimodal gradients. Each point represents a gradient vector or matrix of a batch of data.
  • Figure 5: The improved performance with different $\rho$ and $\lambda$ compared to the joint training baseline.
  • ...and 2 more figures