MFM-DA: Instance-Aware Adaptor and Hierarchical Alignment for Efficient Domain Adaptation in Medical Foundation Models
Jia-Xuan Jiang, Wenhui Lei, Yifeng Wu, Hongtao Wu, Furong Li, Yining Xie, Xiaofan Zhang, Zhong Wang
TL;DR
This work addresses domain gaps in medical foundation models under limited target-domain data by introducing MFM-DA, a two-stage, few-shot unsupervised domain adaptation framework. Stage 1 uses a DDPM trained on the source domain and adapted to the target via a Dynamic Instance-Aware Adaptor guided by a distribution-consistency loss $\mathcal{L}_{DC}$, producing target-style translations. Stage 2 fine-tunes the medical foundation model on generated target-like images using LoRA-based attention tuning and Pyramid Hierarchical Alignment, with a total loss $\mathcal{L}_{\text{total}} = \mathcal{L}_{\text{BCE}} + \mathcal{L}_{\text{align}}$. The Dynamic Instance-Aware Adaptor combines a static domain-direction $\Delta_{ ext{static}}$ with a learnable per-batch direction $\Delta_{ ext{dynamic}}$ to form $\Delta$, while pyramid-level feature alignment across $n$ levels and low-rank adapters achieve robust cross-domain alignment, yielding strong improvements on fundus segmentation and diffusion-based generation, and demonstrating practical domain-gap mitigation for MFMs in few-shot scenarios.
Abstract
Medical Foundation Models (MFMs), trained on large-scale datasets, have demonstrated superior performance across various tasks. However, these models still struggle with domain gaps in practical applications. Specifically, even after fine-tuning on source-domain data, task-adapted foundation models often perform poorly in the target domain. To address this challenge, we propose a few-shot unsupervised domain adaptation (UDA) framework for MFMs, named MFM-DA, which only leverages a limited number of unlabeled target-domain images. Our approach begins by training a Denoising Diffusion Probabilistic Model (DDPM), which is then adapted to the target domain using a proposed dynamic instance-aware adaptor and a distribution direction loss, enabling the DDPM to translate source-domain images into the target domain style. The adapted images are subsequently processed through the MFM, where we introduce a designed channel-spatial alignment Low-Rank Adaptation (LoRA) to ensure effective feature alignment. Extensive experiments on optic cup and disc segmentation tasks demonstrate that MFM-DA outperforms state-of-the-art methods. Our work provides a practical solution to the domain gap issue in real-world MFM deployment. Code will be available at here.
