Table of Contents
Fetching ...

MoRA: LoRA Guided Multi-Modal Disease Diagnosis with Missing Modality

Zhiyi Shi, Junsik Kim, Wanhua Li, Yicong Li, Hanspeter Pfister

TL;DR

The paper addresses the difficulty of applying multi-modal pre-trained transformers to disease diagnosis when modalities are missing and full fine-tuning is costly. It introduces Modality-aware Low-Rank Adaptation (MoRA), which uses a shared down-projection and modality-specific up-projections to form low-rank adaptations, inserted into the first block of a ViLT backbone and trained with only MoRA and a classifier. MoRA achieves superior robustness and accuracy across missing-modality scenarios on Chest X-ray and ocular-disease datasets, while requiring less than 1.6% of trainable parameters and reduced training time. This approach enables practical, resource-efficient deployment of multi-modal diagnostic systems in clinical settings and can be extended to larger pre-trained models and additional modalities in future work.

Abstract

Multi-modal pre-trained models efficiently extract and fuse features from different modalities with low memory requirements for fine-tuning. Despite this efficiency, their application in disease diagnosis is under-explored. A significant challenge is the frequent occurrence of missing modalities, which impairs performance. Additionally, fine-tuning the entire pre-trained model demands substantial computational resources. To address these issues, we introduce Modality-aware Low-Rank Adaptation (MoRA), a computationally efficient method. MoRA projects each input to a low intrinsic dimension but uses different modality-aware up-projections for modality-specific adaptation in cases of missing modalities. Practically, MoRA integrates into the first block of the model, significantly improving performance when a modality is missing. It requires minimal computational resources, with less than 1.6% of the trainable parameters needed compared to training the entire model. Experimental results show that MoRA outperforms existing techniques in disease diagnosis, demonstrating superior performance, robustness, and training efficiency.

MoRA: LoRA Guided Multi-Modal Disease Diagnosis with Missing Modality

TL;DR

The paper addresses the difficulty of applying multi-modal pre-trained transformers to disease diagnosis when modalities are missing and full fine-tuning is costly. It introduces Modality-aware Low-Rank Adaptation (MoRA), which uses a shared down-projection and modality-specific up-projections to form low-rank adaptations, inserted into the first block of a ViLT backbone and trained with only MoRA and a classifier. MoRA achieves superior robustness and accuracy across missing-modality scenarios on Chest X-ray and ocular-disease datasets, while requiring less than 1.6% of trainable parameters and reduced training time. This approach enables practical, resource-efficient deployment of multi-modal diagnostic systems in clinical settings and can be extended to larger pre-trained models and additional modalities in future work.

Abstract

Multi-modal pre-trained models efficiently extract and fuse features from different modalities with low memory requirements for fine-tuning. Despite this efficiency, their application in disease diagnosis is under-explored. A significant challenge is the frequent occurrence of missing modalities, which impairs performance. Additionally, fine-tuning the entire pre-trained model demands substantial computational resources. To address these issues, we introduce Modality-aware Low-Rank Adaptation (MoRA), a computationally efficient method. MoRA projects each input to a low intrinsic dimension but uses different modality-aware up-projections for modality-specific adaptation in cases of missing modalities. Practically, MoRA integrates into the first block of the model, significantly improving performance when a modality is missing. It requires minimal computational resources, with less than 1.6% of the trainable parameters needed compared to training the entire model. Experimental results show that MoRA outperforms existing techniques in disease diagnosis, demonstrating superior performance, robustness, and training efficiency.
Paper Structure (13 sections, 4 equations, 2 figures, 5 tables)

This paper contains 13 sections, 4 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: The structure of MoRA. Images and texts with different missing modalities are separately embedded into input tokens. MoRA projects these input tokens to a low-rank dimension space and utilizes modality-aware up-projections to obtain modality-aware adaptation. Then, MoRA selects modality-aware adaptation according to the missing case. This adaptation is plugged into the first block of the multi-modal pre-train model (consisting of transformer blocks in our experiments) to extract the features. We feed the output class token to the classifier for multi-disease diagnosis. Trainable parameters are signed by flames while frozen ones are signed by lockers.
  • Figure 2: F1-Macro scores on ODIR with different missing rates.