Federated Fine-tuning of SAM-Med3D for MRI-based Dementia Classification
Kaouther Mouheb, Marawan Elbatel, Janne Papma, Geert Jan Biessels, Jurgen Claassen, Huub Middelkoop, Barbara van Munster, Wiesje van der Flier, Inez Ramakers, Stefan Klein, Esther E. Bron
TL;DR
This work addresses how to effectively fine-tune large 3D foundation models in a federated setting for MRI-based dementia classification, evaluating the impact of classification head design, fine-tuning strategy, and aggregation method on diagnostic performance and efficiency. Using SAM-Med3D as the backbone across a large, heterogeneous multi-cohort dataset, the study demonstrates that convolutional classification heads substantially improve accuracy, freezing the encoder often matches full fine-tuning performance while reducing cost, and advanced aggregation (FedCE, Rate-My-LoRA) can approach or beat centralized fine-tuning by mitigating client heterogeneity. The authors provide an open-source framework for federated 3D FM evaluation and offer actionable guidance: prefer compact convolutional heads for efficient communication, consider encoder freezing in FL, and employ advanced aggregation to boost cross-site performance, especially for data-rich cohorts. These findings support practical deployment of federated FMs in decentralized clinical settings and point to future directions in FL theory and 3D FM development with privacy-preserving medical imaging.
Abstract
While foundation models (FMs) offer strong potential for AI-based dementia diagnosis, their integration into federated learning (FL) systems remains underexplored. In this benchmarking study, we systematically evaluate the impact of key design choices: classification head architecture, fine-tuning strategy, and aggregation method, on the performance and efficiency of federated FM tuning using brain MRI data. Using a large multi-cohort dataset, we find that the architecture of the classification head substantially influences performance, freezing the FM encoder achieves comparable results to full fine-tuning, and advanced aggregation methods outperform standard federated averaging. Our results offer practical insights for deploying FMs in decentralized clinical settings and highlight trade-offs that should guide future method development.
