Aggregation Design for Personalized Federated Multi-Modal Learning over Wireless Networks

Benshun Yin; Zhiyong Chen; Meixia Tao

Aggregation Design for Personalized Federated Multi-Modal Learning over Wireless Networks

Benshun Yin, Zhiyong Chen, Meixia Tao

TL;DR

This work tackles personalized Federated Multi-Modal Learning (FMML) over wireless networks under modality heterogeneity and non-IID data. It introduces a learning-based scheme to optimize per-device, per-modality aggregation coefficients $\xi^m_{k,k',t}$ and integrates a modality-aware parameter scheduling policy that leverages channel state to upload only a subset of parameters. Aggregation coefficients are updated via gradient-based steps that connect local losses $F_k$ to server parameters $\bm{W}^m_{k,t}$, enabling personalization without adding communication overhead. Experiments on CREMA-D and MOSEI demonstrate higher personalized accuracy and reduced training time compared with baselines such as FedAvg, FedProx, FedFomo, and FedAMP, validating the practical impact of the proposed approach.

Abstract

Federated Multi-Modal Learning (FMML) is an emerging field that integrates information from different modalities in federated learning to improve the learning performance. In this letter, we develop a parameter scheduling scheme to improve personalized performance and communication efficiency in personalized FMML, considering the non-independent and nonidentically distributed (non-IID) data along with the modality heterogeneity. Specifically, a learning-based approach is utilized to obtain the aggregation coefficients for parameters of different modalities on distinct devices. Based on the aggregation coefficients and channel state, a subset of parameters is scheduled to be uploaded to a server for each modality. Experimental results show that the proposed algorithm can effectively improve the personalized performance of FMML.

Aggregation Design for Personalized Federated Multi-Modal Learning over Wireless Networks

TL;DR

and integrates a modality-aware parameter scheduling policy that leverages channel state to upload only a subset of parameters. Aggregation coefficients are updated via gradient-based steps that connect local losses

to server parameters

, enabling personalization without adding communication overhead. Experiments on CREMA-D and MOSEI demonstrate higher personalized accuracy and reduced training time compared with baselines such as FedAvg, FedProx, FedFomo, and FedAMP, validating the practical impact of the proposed approach.

Abstract

Paper Structure (13 sections, 9 equations, 3 figures, 5 tables, 1 algorithm)

This paper contains 13 sections, 9 equations, 3 figures, 5 tables, 1 algorithm.

Introduction
System Model
Multi-Modal Data and Neural Networks
Personalized Federated Multi-Modal Learning
Local Update Stage
Parameter Aggregation Stage
Aggregation Design for Personalized Federated Multi-Modal Learning
Learn to Update Aggregation Coefficients
Improvement on Communication Efficiency
Simulation Results
Simulation Setup
Performance Comparison
Conclusion

Figures (3)

Figure 1: A federated multi-modal learning system.
Figure 2: Execution process of personalized federated multi-modal learning systems with the update of aggregation coefficients.
Figure 3: The variation of the aggregation coefficients of (a) audio modality, (b) visual modality on CREMA-D with the non-IID-1 distribution.

Aggregation Design for Personalized Federated Multi-Modal Learning over Wireless Networks

TL;DR

Abstract

Aggregation Design for Personalized Federated Multi-Modal Learning over Wireless Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (3)