CAR-MFL: Cross-Modal Augmentation by Retrieval for Multimodal Federated Learning with Missing Modalities
Pranav Poudel, Prashant Shrestha, Sanskar Amgain, Yash Raj Shrestha, Prashnna Gyawali, Binod Bhattarai
TL;DR
This work tackles missing modalities in multimodal federated learning for healthcare while preserving data privacy. It introduces CAR-MFL, a retrieval-based cross-modal augmentation method that augments unimodal clients with complementary modalities drawn from a small public multimodal dataset via intra-modal retrieval and label-aware refinement. During federated training, a fixed-constraint weight adjustment is applied to the complementary encoders to mitigate label noise, and augmentations are performed locally to avoid sharing pairing information. Empirical results on chest X-ray benchmarks show CAR-MFL consistently outperforms baselines such as mFedAvgP and CreamFL across both homogeneous and heterogeneous partitions, with robustness to limited public data and improved handling of rare pathologies. This approach enables practical deployment of multimodal FL in healthcare with missing modalities without requiring large public datasets or synchronized representations.
Abstract
Multimodal AI has demonstrated superior performance over unimodal approaches by leveraging diverse data sources for more comprehensive analysis. However, applying this effectiveness in healthcare is challenging due to the limited availability of public datasets. Federated learning presents an exciting solution, allowing the use of extensive databases from hospitals and health centers without centralizing sensitive data, thus maintaining privacy and security. Yet, research in multimodal federated learning, particularly in scenarios with missing modalities a common issue in healthcare datasets remains scarce, highlighting a critical area for future exploration. Toward this, we propose a novel method for multimodal federated learning with missing modalities. Our contribution lies in a novel cross-modal data augmentation by retrieval, leveraging the small publicly available dataset to fill the missing modalities in the clients. Our method learns the parameters in a federated manner, ensuring privacy protection and improving performance in multiple challenging multimodal benchmarks in the medical domain, surpassing several competitive baselines. Code Available: https://github.com/bhattarailab/CAR-MFL
