Federated Multimodal Learning with Dual Adapters and Selective Pruning for Communication and Computational Efficiency

Duy Phuong Nguyen; J. Pablo Munoz; Tanya Roosta; Ali Jannesari

Federated Multimodal Learning with Dual Adapters and Selective Pruning for Communication and Computational Efficiency

Duy Phuong Nguyen, J. Pablo Munoz, Tanya Roosta, Ali Jannesari

TL;DR

Federated learning with highly non-IID client data struggles to generalize while large multimodal models incur substantial communication costs. The authors propose FedDLP, a dual-adapter framework that attaches a large local LoRA for client-specific personalization and a smaller global LoRA for cross-client knowledge sharing, complemented by SoRA-style pruning to curb transmission and computation. Local and global adapters exchange knowledge through bi-directional distillation and are trained with a combination of cross-entropy and KD losses, while only the global adapters are aggregated server-side; pruning is applied to the local adapters to maintain efficiency. Across vision and language tasks, FedDLP achieves higher test accuracy, reduced performance variance, and lower communication/computation costs compared to baselines, demonstrating scalable, personalized FL for large CLIP-like models.

Abstract

Federated Learning (FL) enables collaborative learning across distributed clients while preserving data privacy. However, FL faces significant challenges when dealing with heterogeneous data distributions, which can lead to suboptimal global models that fail to generalize across diverse clients. In this work, we propose a novel framework designed to tackle these challenges by introducing a dual-adapter approach. The method utilizes a larger local adapter for client-specific personalization and a smaller global adapter to facilitate efficient knowledge sharing across clients. Additionally, we incorporate a pruning mechanism to reduce communication overhead by selectively removing less impactful parameters from the local adapter. Through extensive experiments on a range of vision and language tasks, our method demonstrates superior performance compared to existing approaches. It achieves higher test accuracy, lower performance variance among clients, and improved worst-case performance, all while significantly reducing communication and computation costs. Overall, the proposed method addresses the critical trade-off between model personalization and generalization, offering a scalable solution for real-world FL applications.

Federated Multimodal Learning with Dual Adapters and Selective Pruning for Communication and Computational Efficiency

TL;DR

Abstract

Federated Multimodal Learning with Dual Adapters and Selective Pruning for Communication and Computational Efficiency

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)