MoRA: LoRA Guided Multi-Modal Disease Diagnosis with Missing Modality

Zhiyi Shi; Junsik Kim; Wanhua Li; Yicong Li; Hanspeter Pfister

MoRA: LoRA Guided Multi-Modal Disease Diagnosis with Missing Modality

Zhiyi Shi, Junsik Kim, Wanhua Li, Yicong Li, Hanspeter Pfister

TL;DR

The paper addresses the difficulty of applying multi-modal pre-trained transformers to disease diagnosis when modalities are missing and full fine-tuning is costly. It introduces Modality-aware Low-Rank Adaptation (MoRA), which uses a shared down-projection and modality-specific up-projections to form low-rank adaptations, inserted into the first block of a ViLT backbone and trained with only MoRA and a classifier. MoRA achieves superior robustness and accuracy across missing-modality scenarios on Chest X-ray and ocular-disease datasets, while requiring less than 1.6% of trainable parameters and reduced training time. This approach enables practical, resource-efficient deployment of multi-modal diagnostic systems in clinical settings and can be extended to larger pre-trained models and additional modalities in future work.

Abstract

Multi-modal pre-trained models efficiently extract and fuse features from different modalities with low memory requirements for fine-tuning. Despite this efficiency, their application in disease diagnosis is under-explored. A significant challenge is the frequent occurrence of missing modalities, which impairs performance. Additionally, fine-tuning the entire pre-trained model demands substantial computational resources. To address these issues, we introduce Modality-aware Low-Rank Adaptation (MoRA), a computationally efficient method. MoRA projects each input to a low intrinsic dimension but uses different modality-aware up-projections for modality-specific adaptation in cases of missing modalities. Practically, MoRA integrates into the first block of the model, significantly improving performance when a modality is missing. It requires minimal computational resources, with less than 1.6% of the trainable parameters needed compared to training the entire model. Experimental results show that MoRA outperforms existing techniques in disease diagnosis, demonstrating superior performance, robustness, and training efficiency.

MoRA: LoRA Guided Multi-Modal Disease Diagnosis with Missing Modality

TL;DR

Abstract

Paper Structure (13 sections, 4 equations, 2 figures, 5 tables)

This paper contains 13 sections, 4 equations, 2 figures, 5 tables.

Introduction
Method
Problem Definition
Modality-Aware Low-Rank Adaptation
Overall Framework
Experiments
Datasets
Implementation Details
Comparisons with the previous method
Ablation Study
Conclusion
Acknowledgments.
Disclosure of Interests.

Figures (2)

Figure 1: The structure of MoRA. Images and texts with different missing modalities are separately embedded into input tokens. MoRA projects these input tokens to a low-rank dimension space and utilizes modality-aware up-projections to obtain modality-aware adaptation. Then, MoRA selects modality-aware adaptation according to the missing case. This adaptation is plugged into the first block of the multi-modal pre-train model (consisting of transformer blocks in our experiments) to extract the features. We feed the output class token to the classifier for multi-disease diagnosis. Trainable parameters are signed by flames while frozen ones are signed by lockers.
Figure 2: F1-Macro scores on ODIR with different missing rates.

MoRA: LoRA Guided Multi-Modal Disease Diagnosis with Missing Modality

TL;DR

Abstract

MoRA: LoRA Guided Multi-Modal Disease Diagnosis with Missing Modality

Authors

TL;DR

Abstract

Table of Contents

Figures (2)