Table of Contents
Fetching ...

ICYM2I: The illusion of multimodal informativeness under missingness

Young Sang Choi, Vincent Jeanselme, Pierre Elias, Shalmali Joshi

TL;DR

ICYM2I (In Case You Multimodal Missed It), a framework for the evaluation of predictive performance and information gain under missingness through inverse probability weighting-based correction, is introduced.

Abstract

Multimodal learning is of continued interest in artificial intelligence-based applications, motivated by the potential information gain from combining different data modalities. However, modalities observed in the source environment may differ from the modalities observed in the target environment due to multiple factors, including cost, hardware failure, or the perceived \textit{informativeness} of a given modality. This change in missingness patterns between the source and target environment has not been carefully studied. Na{ï}ve estimation of the information gain associated with including an additional modality without accounting for missingness may result in improper estimates of that modality's value in the target environment. We formalize the problem of missingness, demonstrate its ubiquity, and show that the subsequent distribution shift induces bias when the missingness process is not explicitly accounted for. To address this issue, we introduce ICYM2I (In Case You Multimodal Missed It), a framework for the evaluation of predictive performance and information gain under missingness through inverse probability weighting-based correction. We demonstrate the importance of the proposed adjustment to estimate information gain under missingness on synthetic, semi-synthetic, and real-world datasets.

ICYM2I: The illusion of multimodal informativeness under missingness

TL;DR

ICYM2I (In Case You Multimodal Missed It), a framework for the evaluation of predictive performance and information gain under missingness through inverse probability weighting-based correction, is introduced.

Abstract

Multimodal learning is of continued interest in artificial intelligence-based applications, motivated by the potential information gain from combining different data modalities. However, modalities observed in the source environment may differ from the modalities observed in the target environment due to multiple factors, including cost, hardware failure, or the perceived \textit{informativeness} of a given modality. This change in missingness patterns between the source and target environment has not been carefully studied. Na{ï}ve estimation of the information gain associated with including an additional modality without accounting for missingness may result in improper estimates of that modality's value in the target environment. We formalize the problem of missingness, demonstrate its ubiquity, and show that the subsequent distribution shift induces bias when the missingness process is not explicitly accounted for. To address this issue, we introduce ICYM2I (In Case You Multimodal Missed It), a framework for the evaluation of predictive performance and information gain under missingness through inverse probability weighting-based correction. We demonstrate the importance of the proposed adjustment to estimate information gain under missingness on synthetic, semi-synthetic, and real-world datasets.

Paper Structure

This paper contains 28 sections, 3 theorems, 19 equations, 6 figures, 9 tables, 2 algorithms.

Key Result

Lemma 1

The loss function computed on the observed data $l_{\Omega_{\text{obs}}^{}}(x_1, x_2, y)$ can be reweighted to approximate the target loss $l_\Omega^{}(x_1, x_2, y)$ as follows: where $p(m_1, m_2, m_y \mid C)$ is the probability of missingness, given the covariates $C$.

Figures (6)

  • Figure 1: Overview of the proposed framework. Curation often discards missing data, resulting in a discrepancy between the collected $\Omega$ and source datasets $\Omega_{\text{source}}$ used for training. Current practice is denoted in blue: naïve training and evaluating on $\Omega_{\text{source}}$ leads to biased estimates of performance and informativeness on target data. The orange path illustrates the proposed ICYM$^2$I: a double inverse probability weighting (IPW) mechanism that yields accurate performance and informativeness estimates under the target distribution.
  • Figure 2: Directed Acyclic Graphs of the assumed data-generating processes. On the left is the commonly assumed graph with no missingness. On the right is the proposed missingness formalism. $X_1$ and $X_2$ are two modalities of interest, $Y$ is the label of interest. The missingness process depends on $C$. Filled point nodes are observed variables, while unfilled nodes are unobserved. Gray edges indicate MAR missingness for a given modality.
  • Figure 3: Data generating processes for synthetic experiments. $z_i$ denote latent vectors, while all other variables are observed. Filled point nodes are observed variables, while unfilled nodes are unobserved.
  • Figure 4: Comparison between estimated PID values under increasing missingness in UR-FUNNY.
  • Figure 5: Comparison between estimated PID values under increasing missingness in UR-FUNNY.
  • ...and 1 more figures

Theorems & Definitions (5)

  • Lemma 1: IPW Training
  • Corollary 1: ICYM$^2$I-learn
  • Lemma 2: Corrected mutual information
  • proof
  • proof