Table of Contents
Fetching ...

Less Could Be Better: Parameter-efficient Fine-tuning Advances Medical Vision Foundation Models

Chenyu Lian, Hong-Yu Zhou, Yizhou Yu, Liansheng Wang

TL;DR

Transfer learning in medical imaging is often limited by scarce annotations; this paper investigates whether parameter-efficient fine-tuning (PEFT) can outperform full fine-tuning (FFT) for chest radiography foundation models. It compares LoRA, a PEFT method, against FFT on two self-supervised radiography foundation models (MAE and MRM) across NIH ChestX-ray14, CheXpert, and RSNA datasets with label ratios of $1\%$, $10\%$, and $100\%$. The results show LoRA improves in 13 of 18 transfer tasks and achieves a data-efficient AUROC of $80.6\%$ with $1\%$ labeled data while tuning only $0.3\%$ of parameters; at $100\%$ labeled data, LoRA matches FFT while tuning about $1.5\%$ of parameters. The work scales to larger vision backbones (ViT-Base/Large) and demonstrates strong gains even when pre-trained on natural images, highlighting practical, data-efficient transfer learning for medical vision foundation models. Code and models are released to enable broader adoption.

Abstract

Parameter-efficient fine-tuning (PEFT) that was initially developed for exploiting pre-trained large language models has recently emerged as an effective approach to perform transfer learning on computer vision tasks. However, the effectiveness of PEFT on medical vision foundation models is still unclear and remains to be explored. As a proof of concept, we conducted a detailed empirical study on applying PEFT to chest radiography foundation models. Specifically, we delved into LoRA, a representative PEFT method, and compared it against full-parameter fine-tuning (FFT) on two self-supervised radiography foundation models across three well-established chest radiograph datasets. Our results showed that LoRA outperformed FFT in 13 out of 18 transfer learning tasks by at most 2.9% using fewer than 1% tunable parameters. Combining LoRA with foundation models, we set up new state-of-the-art on a range of data-efficient learning tasks, such as an AUROC score of 80.6% using 1% labeled data on NIH ChestX-ray14. We hope this study can evoke more attention from the community in the use of PEFT for transfer learning on medical imaging tasks. Code and models are available at https://github.com/RL4M/MED-PEFT.

Less Could Be Better: Parameter-efficient Fine-tuning Advances Medical Vision Foundation Models

TL;DR

Transfer learning in medical imaging is often limited by scarce annotations; this paper investigates whether parameter-efficient fine-tuning (PEFT) can outperform full fine-tuning (FFT) for chest radiography foundation models. It compares LoRA, a PEFT method, against FFT on two self-supervised radiography foundation models (MAE and MRM) across NIH ChestX-ray14, CheXpert, and RSNA datasets with label ratios of , , and . The results show LoRA improves in 13 of 18 transfer tasks and achieves a data-efficient AUROC of with labeled data while tuning only of parameters; at labeled data, LoRA matches FFT while tuning about of parameters. The work scales to larger vision backbones (ViT-Base/Large) and demonstrates strong gains even when pre-trained on natural images, highlighting practical, data-efficient transfer learning for medical vision foundation models. Code and models are released to enable broader adoption.

Abstract

Parameter-efficient fine-tuning (PEFT) that was initially developed for exploiting pre-trained large language models has recently emerged as an effective approach to perform transfer learning on computer vision tasks. However, the effectiveness of PEFT on medical vision foundation models is still unclear and remains to be explored. As a proof of concept, we conducted a detailed empirical study on applying PEFT to chest radiography foundation models. Specifically, we delved into LoRA, a representative PEFT method, and compared it against full-parameter fine-tuning (FFT) on two self-supervised radiography foundation models across three well-established chest radiograph datasets. Our results showed that LoRA outperformed FFT in 13 out of 18 transfer learning tasks by at most 2.9% using fewer than 1% tunable parameters. Combining LoRA with foundation models, we set up new state-of-the-art on a range of data-efficient learning tasks, such as an AUROC score of 80.6% using 1% labeled data on NIH ChestX-ray14. We hope this study can evoke more attention from the community in the use of PEFT for transfer learning on medical imaging tasks. Code and models are available at https://github.com/RL4M/MED-PEFT.
Paper Structure (6 sections, 5 tables)