Effective and Robust Multimodal Medical Image Analysis

Joy Dhar; Nayyar Zaidi; Maryam Haghighat

Effective and Robust Multimodal Medical Image Analysis

Joy Dhar, Nayyar Zaidi, Maryam Haghighat

TL;DR

A novel Multi-Attention Integration Learning (MAIL) network, incorporating two key components: an efficient residual learning attention block for capturing refined modality-specific multi-scale patterns and an efficient multimodal cross-attention module for learning enriched complementary shared representations across diverse modalities is proposed.

Abstract

Multimodal Fusion Learning (MFL), leveraging disparate data from various imaging modalities (e.g., MRI, CT, SPECT), has shown great potential for addressing medical problems such as skin cancer and brain tumor prediction. However, existing MFL methods face three key limitations: a) they often specialize in specific modalities, and overlook effective shared complementary information across diverse modalities, hence limiting their generalizability for multi-disease analysis; b) they rely on computationally expensive models, restricting their applicability in resource-limited settings; and c) they lack robustness against adversarial attacks, compromising reliability in medical AI applications. To address these limitations, we propose a novel Multi-Attention Integration Learning (MAIL) network, incorporating two key components: a) an efficient residual learning attention block for capturing refined modality-specific multi-scale patterns and b) an efficient multimodal cross-attention module for learning enriched complementary shared representations across diverse modalities. Furthermore, to ensure adversarial robustness, we extend MAIL network to design Robust-MAIL by incorporating random projection filters and modulated attention noise. Extensive evaluations on 20 public datasets show that both MAIL and Robust-MAIL outperform existing methods, achieving performance gains of up to 9.34% while reducing computational costs by up to 78.3%. These results highlight the superiority of our approaches, ensuring more reliable predictions than top competitors. Code: https://github.com/misti1203/MAIL-Robust-MAIL.

Effective and Robust Multimodal Medical Image Analysis

TL;DR

Abstract

Paper Structure (17 sections, 25 equations, 6 figures, 7 tables, 2 algorithms)

This paper contains 17 sections, 25 equations, 6 figures, 7 tables, 2 algorithms.

Introduction
Related Study
Proposed Method
Modality-Specific Task Learning Phase
Efficient Residual Learning Attention Block (ERLA)
Efficient Multimodal Cross Attention Module (EMCAM)
Target-specific Multitask Learning
Random Projection with Attention Noise
Adversarial Training with RPAN
Experimental Analysis and Results
Performance Comparisons
Impact of Robust-MAIL Network
Ablation Study
Conclusion
RPF: Random Projection Filter
...and 2 more sections

Figures (6)

Figure 1: (A–B) Attention‐alignment paradigms for MFL: cascaded attention (e.g., DRIFA-Netdhar2024multimodal, MuMuislam2022mumu) vs. our parallel fusion attention (EMCAM in MAIL). The parallel design reduces information loss during shared‐representation learning unlike cascaded pipelines.
Figure 2: Architecture of MAIL network (comprising of (A) MSTL and (B) TMTL phases). Key components with-in MSTL phase are (C) ERLA and (F) EMCAM blocks. ERLA is based on (D) EMILA module, comprising of (G) MSGDC block inspired from EMCAD'sMSDCrahman2024emcad and (E) Channel Attention (CA). EMCAM details are in Figure \ref{['fig:fig4']}.
Figure 3: Components of EMCAM: (A) MFIFA module captures multi-frequency multimodal global contexts, while (B) EMSCA module refines multimodal spatial representations.
Figure 4: (A-B) Overview of Robust-MAIL, where MAIL integrates EMCAM, incorporating RPF and MAN to form the RPAN module. RPAN consists of key components: RPF[A] / RPF[I] (sampled Gaussian matrices), EMCAM[A] / EMCAM[I], and RPAN[A] / RPAN[I], where [A] and [I] denote attack and inference phases. (C–D) Illustration of MAN utilizing learnable feature-layer noise, where noise is modulated through learnable weights combined with random noise to enhance robustness. (E) Integration of RPF into the MSGDC from ERLA and EMCAM.
Figure 5: Evaluation of Robust-MAIL's performance against stronger PGD attacks on the D5 and D3 datasets.
...and 1 more figures

Effective and Robust Multimodal Medical Image Analysis

TL;DR

Abstract

Effective and Robust Multimodal Medical Image Analysis

Authors

TL;DR

Abstract

Table of Contents

Figures (6)