Probing the Efficacy of Federated Parameter-Efficient Fine-Tuning of Vision Transformers for Medical Image Classification
Naif Alkhunaizi, Faris Almalik, Rouqaiah Al-Refai, Muzammal Naseer, Karthik Nandakumar
TL;DR
The paper investigates federated parameter-efficient fine-tuning (PEFT) for Vision Transformers in medical image classification, addressing data scarcity, privacy, and communication constraints across institutions. It systematically evaluates multiple federated PEFT strategies, including Visual Prompt Tuning (VPT), low-rank adaptations (LoRA), decomposed prompts (DVPT), and stochastic block attention (SBA), as well as hybrid combinations, under both IID and non-IID, in-domain and out-of-domain conditions. The findings show a clear trade-off: while many methods dramatically reduce exchanged parameters, accuracy can degrade, especially for out-of-domain data and non-IID client distributions, with about a 4% accuracy drop per order of magnitude reduction in parameters in OOD scenarios. The work emphasizes the importance of starting from in-domain medical foundation models when possible and highlights the relative robustness of visual prompts over textual prompts for medical imaging tasks, informing practical deployment of federated PEFT in healthcare.
Abstract
With the advent of large pre-trained transformer models, fine-tuning these models for various downstream tasks is a critical problem. Paucity of training data, the existence of data silos, and stringent privacy constraints exacerbate this fine-tuning problem in the medical imaging domain, creating a strong need for algorithms that enable collaborative fine-tuning of pre-trained models. Moreover, the large size of these models necessitates the use of parameter-efficient fine-tuning (PEFT) to reduce the communication burden in federated learning. In this work, we systematically investigate various federated PEFT strategies for adapting a Vision Transformer (ViT) model (pre-trained on a large natural image dataset) for medical image classification. Apart from evaluating known PEFT techniques, we introduce new federated variants of PEFT algorithms such as visual prompt tuning (VPT), low-rank decomposition of visual prompts, stochastic block attention fine-tuning, and hybrid PEFT methods like low-rank adaptation (LoRA)+VPT. Moreover, we perform a thorough empirical analysis to identify the optimal PEFT method for the federated setting and understand the impact of data distribution on federated PEFT, especially for out-of-domain (OOD) and non-IID data. The key insight of this study is that while most federated PEFT methods work well for in-domain transfer, there is a substantial accuracy vs. efficiency trade-off when dealing with OOD and non-IID scenarios, which is commonly the case in medical imaging. Specifically, every order of magnitude reduction in fine-tuned/exchanged parameters can lead to a 4% drop in accuracy. Thus, the initial model choice is crucial for federated PEFT. It is preferable to use medical foundation models learned from in-domain medical image data (if available) rather than general vision models.
