Table of Contents
Fetching ...

Exploring Multi-Modality Dynamics: Insights and Challenges in Multimodal Fusion for Biomedical Tasks

Laura Wenderoth

TL;DR

It is found that feature informativeness improves performance and explainability, while modality informativeness does not provide significant advantages and can lead to performance degradation.

Abstract

This paper investigates the MM dynamics approach proposed by Han et al. (2022) for multi-modal fusion in biomedical classification tasks. The MM dynamics algorithm integrates feature-level and modality-level informativeness to dynamically fuse modalities for improved classification performance. However, our analysis reveals several limitations and challenges in replicating and extending the results of MM dynamics. We found that feature informativeness improves performance and explainability, while modality informativeness does not provide significant advantages and can lead to performance degradation. Based on these results, we have extended feature informativeness to image data, resulting in the development of Image MM dynamics. Although this approach showed promising qualitative results, it did not outperform baseline methods quantitatively.

Exploring Multi-Modality Dynamics: Insights and Challenges in Multimodal Fusion for Biomedical Tasks

TL;DR

It is found that feature informativeness improves performance and explainability, while modality informativeness does not provide significant advantages and can lead to performance degradation.

Abstract

This paper investigates the MM dynamics approach proposed by Han et al. (2022) for multi-modal fusion in biomedical classification tasks. The MM dynamics algorithm integrates feature-level and modality-level informativeness to dynamically fuse modalities for improved classification performance. However, our analysis reveals several limitations and challenges in replicating and extending the results of MM dynamics. We found that feature informativeness improves performance and explainability, while modality informativeness does not provide significant advantages and can lead to performance degradation. Based on these results, we have extended feature informativeness to image data, resulting in the development of Image MM dynamics. Although this approach showed promising qualitative results, it did not outperform baseline methods quantitatively.

Paper Structure

This paper contains 23 sections, 8 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: Schematic overview of the MM dynamics algorithm. The schematic structure of the MM dynamics algorithm consists of three main steps. Firstly, the feature informativeness is computed using the encoder $E^m$, where $m$ ranges from 1 to $M$, representing the number of modalities. Subsequently, the input is multiplied by the feature informativeness vector and serves as input for the uni-modal classifier $f^m$. The second step involves calculating the modality informativeness, also known as the true class probability ($TCP$), using the regression networks $g^m$. The latent representation of $g^m$ is then weighted based on the estimated $\widehat{TCP}$. In the final step, the dynamically weighted representations are concatenated, and the ultimate classification is performed using another classifier not illustrated in the figure. Reprinted from Han_Yang_Huang_Zhang_Yao_2022.
  • Figure 2: Representative examples of the four different cell classes of the data set used in this study matek2021expert. (a) displays a monocyte, (b) a neutrophil, (c) a lymphocyte, and (d) an erythroblast.
  • Figure 3: Overview of ablation studies that examine the impact of feature informativeness (FI) and modality informativeness (MI) components in the MM dynamics approach. FI refers to the inclusion of feature informativeness, MI refers to the inclusion of modality informativeness, and 'Both' indicates the incorporation of both FI and MI components. 'None' denotes scenarios where neither FI nor MI components are utilised. The figure displays the results across various evaluation metrics. All results were generated using MM dynamics trained on RNA and protein with latent representation dimensions of 250 and 35, respectively.
  • Figure 4: Overview of classification confidence. The chart shows the relative error of $TCP$. The x-axis represents the threshold of the relative error. The y-axis shows the percentage of instances with an RE greater than $x$: $x\cdot {TCP} + {TCP} \leq \widehat{T C P}$. $TCP$ was extracted from the best-performing MM dynamics network with latent dimensions of 35 for proteins and 250 for RNA.
  • Figure 5: The top 10 informative biomarkers are displayed from the modalities protein (b) and RNA (b) identified by MM dynamics feature informativeness encoder.
  • ...and 3 more figures