Table of Contents
Fetching ...

FedMFS: Federated Multimodal Fusion Learning with Selective Modality Communication

Liangqi Yuan, Dong-Jun Han, Vishnu Pandi Chellapandi, Stanislaw H. Żak, Christopher G. Brinton

TL;DR

FedMFS addresses multimodal federated learning on heterogeneous devices by adding a selective modality communication mechanism. It uses Shapley-value-based assessment of each modality's predictive contribution and a model-size penalty to selectively upload modalities, balancing accuracy and communication. The server aggregates only the selected modality models, while clients maintain personalized ensembles, enabling efficient and interpretable MMFL. Experiments on ActionSense show substantial reductions in communication overhead with competitive accuracy, demonstrating practical applicability in resource-constrained IoT scenarios.

Abstract

Multimodal federated learning (FL) aims to enrich model training in FL settings where devices are collecting measurements across multiple modalities (e.g., sensors measuring pressure, motion, and other types of data). However, key challenges to multimodal FL remain unaddressed, particularly in heterogeneous network settings: (i) the set of modalities collected by each device will be diverse, and (ii) communication limitations prevent devices from uploading all their locally trained modality models to the server. In this paper, we propose Federated Multimodal Fusion learning with Selective modality communication (FedMFS), a new multimodal fusion FL methodology that can tackle the above mentioned challenges. The key idea is the introduction of a modality selection criterion for each device, which weighs (i) the impact of the modality, gauged by Shapley value analysis, against (ii) the modality model size as a gauge for communication overhead. This enables FedMFS to flexibly balance performance against communication costs, depending on resource constraints and application requirements. Experiments on the real-world ActionSense dataset demonstrate the ability of FedMFS to achieve comparable accuracy to several baselines while reducing the communication overhead by over 4x.

FedMFS: Federated Multimodal Fusion Learning with Selective Modality Communication

TL;DR

FedMFS addresses multimodal federated learning on heterogeneous devices by adding a selective modality communication mechanism. It uses Shapley-value-based assessment of each modality's predictive contribution and a model-size penalty to selectively upload modalities, balancing accuracy and communication. The server aggregates only the selected modality models, while clients maintain personalized ensembles, enabling efficient and interpretable MMFL. Experiments on ActionSense show substantial reductions in communication overhead with competitive accuracy, demonstrating practical applicability in resource-constrained IoT scenarios.

Abstract

Multimodal federated learning (FL) aims to enrich model training in FL settings where devices are collecting measurements across multiple modalities (e.g., sensors measuring pressure, motion, and other types of data). However, key challenges to multimodal FL remain unaddressed, particularly in heterogeneous network settings: (i) the set of modalities collected by each device will be diverse, and (ii) communication limitations prevent devices from uploading all their locally trained modality models to the server. In this paper, we propose Federated Multimodal Fusion learning with Selective modality communication (FedMFS), a new multimodal fusion FL methodology that can tackle the above mentioned challenges. The key idea is the introduction of a modality selection criterion for each device, which weighs (i) the impact of the modality, gauged by Shapley value analysis, against (ii) the modality model size as a gauge for communication overhead. This enables FedMFS to flexibly balance performance against communication costs, depending on resource constraints and application requirements. Experiments on the real-world ActionSense dataset demonstrate the ability of FedMFS to achieve comparable accuracy to several baselines while reducing the communication overhead by over 4x.
Paper Structure (12 sections, 12 equations, 3 figures, 2 tables, 1 algorithm)

This paper contains 12 sections, 12 equations, 3 figures, 2 tables, 1 algorithm.

Figures (3)

  • Figure 1: Schematic representation of traditional multimodal fusion federated learning vs. the proposed FedMFS.
  • Figure 2: Comparison of accuracy between FedMFS with the configuration $\gamma=1, \alpha_s=0.2, \alpha_c=0.8$ and four baselines on a communication overhead scale. Only up to 1000 MB of communication overhead is depicted, while the data-level fusion approach requires close to 2000 MB to complete all iterations.
  • Figure 3: The impact of modality models on the ensemble model's final prediction throughout the FedMFS iteration, exemplified with the configuration $\gamma=1, \alpha_s=0.2, \alpha_c=0.8$.