Table of Contents
Fetching ...

Negative to Positive Co-learning with Aggressive Modality Dropout

Nicholas Magal, Minh Tran, Riku Arakawa, Suzanne Nie

TL;DR

The paper tackles the problem of negative co-learning (NCL) in multimodal models when modalities are unavailable at test time. It introduces aggressive modality dropout during training to force reliance on multiple modalities and to prepare for unimodal deployment, enabling reversal of NCL to positive co-learning (PCL). Evaluations with bi-EFLSTM and Memory Fusion Network on IEMOCAP and MOSI show that high dropout (around 0.8) on non-language modalities can dramatically boost unimodal performance under NCL, with more modest gains in PCL. The results suggest that modality dropout enhances robustness for deployment scenarios with missing modalities and offers guidance for future work on dropout levels and modality selection.

Abstract

This paper aims to document an effective way to improve multimodal co-learning by using aggressive modality dropout. We find that by using aggressive modality dropout we are able to reverse negative co-learning (NCL) to positive co-learning (PCL). Aggressive modality dropout can be used to "prep" a multimodal model for unimodal deployment, and dramatically increases model performance during negative co-learning, where during some experiments we saw a 20% gain in accuracy. We also benchmark our modality dropout technique against PCL to show that our modality drop out technique improves co-learning during PCL, although it does not have as much as an substantial effect as it does during NCL. Github: https://github.com/nmagal/modality_drop_for_colearning

Negative to Positive Co-learning with Aggressive Modality Dropout

TL;DR

The paper tackles the problem of negative co-learning (NCL) in multimodal models when modalities are unavailable at test time. It introduces aggressive modality dropout during training to force reliance on multiple modalities and to prepare for unimodal deployment, enabling reversal of NCL to positive co-learning (PCL). Evaluations with bi-EFLSTM and Memory Fusion Network on IEMOCAP and MOSI show that high dropout (around 0.8) on non-language modalities can dramatically boost unimodal performance under NCL, with more modest gains in PCL. The results suggest that modality dropout enhances robustness for deployment scenarios with missing modalities and offers guidance for future work on dropout levels and modality selection.

Abstract

This paper aims to document an effective way to improve multimodal co-learning by using aggressive modality dropout. We find that by using aggressive modality dropout we are able to reverse negative co-learning (NCL) to positive co-learning (PCL). Aggressive modality dropout can be used to "prep" a multimodal model for unimodal deployment, and dramatically increases model performance during negative co-learning, where during some experiments we saw a 20% gain in accuracy. We also benchmark our modality dropout technique against PCL to show that our modality drop out technique improves co-learning during PCL, although it does not have as much as an substantial effect as it does during NCL. Github: https://github.com/nmagal/modality_drop_for_colearning
Paper Structure (6 sections, 11 equations, 5 figures)

This paper contains 6 sections, 11 equations, 5 figures.

Figures (5)

  • Figure 1: Positive Co-learning (PCL) vs Negative Co-learning (NCL). PCL cases where co-learning leads to better performance during test time on uni-modal data compared to unimodally trained models, while NCL refers to instances where the multimodally trained model performs worse on test time when compared to the unimodally trained variant. Neutral Co-learning (NeCL) refers to cases where both multmodal and unimodal varients perform the same.
  • Figure 2: Extracted Features of Hands and Head in IEMOCAP
  • Figure 3: Image Processing
  • Figure 4: Confusion Matrix of the bi-EFLSTM at 0% modality dropout vs 80% dropout on both the audio and visual modalities. As shown, without modality dropout the model struggles to learn anything meaningful and only outputs the neutral class. With modality dropout the model is able to correctly classify other classes and obtains much better performance.
  • Figure 5: Performance using the bi-EFLSTM on Iemocap (left) using modality drop out and using the MFN on MOSI (right). The light blue row on the table figures corresponds to the unimodal variant, with the rest of the rows being the multimodal variant. As shown, modality dropout using the bi-EFLSTM substantially improves the co-learning process across all metrics, and is able to reverse PCL to NCL. Modality drop out during PCL using the MFN does not have as big as an effect as it does during NCL. However, modality dropout still improves metrics compared to non dropout settings.