Communication-Efficient Multimodal Federated Learning: Joint Modality and Client Selection
Liangqi Yuan, Dong-Jun Han, Su Wang, Devesh Upadhyay, Christopher G. Brinton
TL;DR
This work tackles the challenge of efficient learning in multimodal federated settings with heterogeneous client modalities and constrained communication. It introduces mmFedMC, a framework that performs decision-level fusion with modular modality models uploaded to the server and a locally retained ensemble per client, enabling personalization. Modality selection is driven by a composite priority that combines Shapley-value-based impact, modality model size, and recency, while a server-side client selection based on local loss further reduces communication without sacrificing performance. Empirical results across five real-world datasets show mmFedMC achieves comparable accuracy to baselines while delivering up to an order-of-magnitude reduction in communication, highlighting its practical value for IoT and edge deployments. The approach offers a flexible, modular solution for heterogeneous mmFL, with clear pathways for dynamic configuration and broader modality support in future work.
Abstract
Multimodal federated learning (FL) aims to enrich model training in FL settings where clients are collecting measurements across multiple modalities. However, key challenges to multimodal FL remain unaddressed, particularly in heterogeneous network settings where: (i) the set of modalities collected by each client will be diverse, and (ii) communication limitations prevent clients from uploading all their locally trained modality models to the server. In this paper, we propose multimodal Federated learning with joint Modality and Client selection (mmFedMC), a new FL methodology that can tackle the above-mentioned challenges in multimodal settings. The joint selection algorithm incorporates two main components: (a) A modality selection methodology for each client, which weighs (i) the impact of the modality, gauged by Shapley value analysis, (ii) the modality model size as a gauge of communication overhead, against (iii) the frequency of modality model updates, denoted recency, to enhance generalizability. (b) A client selection strategy for the server based on the local loss of modality model at each client. Experiments on five real-world datasets demonstrate the ability of mmFedMC to achieve comparable accuracy to several baselines while reducing the communication overhead by over 20x. A demo video of our methodology is available at https://liangqiy.com/mmfedmc/.
