Table of Contents
Fetching ...

Overcome Modal Bias in Multi-modal Federated Learning via Balanced Modality Selection

Yunfeng Fan, Wenchao Xu, Haozhao Wang, Fushuo Huo, Jinyu Chen, Song Guo

TL;DR

This work tackles modal bias in multi-modal federated learning caused by uneven modality distributions across clients. It introduces BMSFed, a framework combining a modal enhancement (ME) loss based on aggregated global prototypes with a balanced modality selection mechanism that decouples gradient contributions through two submodular objectives over multi-modal and uni-modal clients. Empirical results across CREMA-D, AVE, CG-MNIST, and ModelNet40 under IID and non-IID settings show that BMSFed outperforms baselines, improves the weak modality, and effectively mitigates global modality bias without extra communication or computation. The approach is robust to modality incongruity and scalable across data distributions, making it practically impactful for real-world multi-modal FL deployments.

Abstract

Selecting proper clients to participate in each federated learning (FL) round is critical to effectively harness a broad range of distributed data. Existing client selection methods simply consider the mining of distributed uni-modal data, yet, their effectiveness may diminish in multi-modal FL (MFL) as the modality imbalance problem not only impedes the collaborative local training but also leads to a severe global modality-level bias. We empirically reveal that local training with a certain single modality may contribute more to the global model than training with all local modalities. To effectively exploit the distributed multiple modalities, we propose a novel Balanced Modality Selection framework for MFL (BMSFed) to overcome the modal bias. On the one hand, we introduce a modal enhancement loss during local training to alleviate local imbalance based on the aggregated global prototypes. On the other hand, we propose the modality selection aiming to select subsets of local modalities with great diversity and achieving global modal balance simultaneously. Our extensive experiments on audio-visual, colored-gray, and front-back datasets showcase the superiority of BMSFed over baselines and its effectiveness in multi-modal data exploitation.

Overcome Modal Bias in Multi-modal Federated Learning via Balanced Modality Selection

TL;DR

This work tackles modal bias in multi-modal federated learning caused by uneven modality distributions across clients. It introduces BMSFed, a framework combining a modal enhancement (ME) loss based on aggregated global prototypes with a balanced modality selection mechanism that decouples gradient contributions through two submodular objectives over multi-modal and uni-modal clients. Empirical results across CREMA-D, AVE, CG-MNIST, and ModelNet40 under IID and non-IID settings show that BMSFed outperforms baselines, improves the weak modality, and effectively mitigates global modality bias without extra communication or computation. The approach is robust to modality incongruity and scalable across data distributions, making it practically impactful for real-world multi-modal FL deployments.

Abstract

Selecting proper clients to participate in each federated learning (FL) round is critical to effectively harness a broad range of distributed data. Existing client selection methods simply consider the mining of distributed uni-modal data, yet, their effectiveness may diminish in multi-modal FL (MFL) as the modality imbalance problem not only impedes the collaborative local training but also leads to a severe global modality-level bias. We empirically reveal that local training with a certain single modality may contribute more to the global model than training with all local modalities. To effectively exploit the distributed multiple modalities, we propose a novel Balanced Modality Selection framework for MFL (BMSFed) to overcome the modal bias. On the one hand, we introduce a modal enhancement loss during local training to alleviate local imbalance based on the aggregated global prototypes. On the other hand, we propose the modality selection aiming to select subsets of local modalities with great diversity and achieving global modal balance simultaneously. Our extensive experiments on audio-visual, colored-gray, and front-back datasets showcase the superiority of BMSFed over baselines and its effectiveness in multi-modal data exploitation.
Paper Structure (20 sections, 19 equations, 7 figures, 9 tables, 1 algorithm)

This paper contains 20 sections, 19 equations, 7 figures, 9 tables, 1 algorithm.

Figures (7)

  • Figure 1: Left: Traditional client selection in FL aims to sample a client subset in each round while our modality selection considers each local modality as the sampling unit. Right: The paradigm of BMSFed with four clients. The global prototypes are used to enhance the weak modality during local update. Only networks corresponding to the selected modalities will be uploaded to the server for aggregation.
  • Figure 2: Test accuracy of BMSFed compared with other baselines on CREMA-D and AVE. BMSFed converges to more accurate solutions than all baselines.
  • Figure 3: Robustness validation on data size, local epoch and client number. Our BMSFed consistently outperforms baseline (FedAvg) under various scenarios.
  • Figure 4: Proportional change of audio and visual respectively and the curve of global imbalance ratio during training on CREMA-D under IID setting.
  • Figure 5: The visualization of data distribution on CREMA-D.
  • ...and 2 more figures