Table of Contents
Fetching ...

Federated Multimodal Learning with Dual Adapters and Selective Pruning for Communication and Computational Efficiency

Duy Phuong Nguyen, J. Pablo Munoz, Tanya Roosta, Ali Jannesari

TL;DR

Federated learning with highly non-IID client data struggles to generalize while large multimodal models incur substantial communication costs. The authors propose FedDLP, a dual-adapter framework that attaches a large local LoRA for client-specific personalization and a smaller global LoRA for cross-client knowledge sharing, complemented by SoRA-style pruning to curb transmission and computation. Local and global adapters exchange knowledge through bi-directional distillation and are trained with a combination of cross-entropy and KD losses, while only the global adapters are aggregated server-side; pruning is applied to the local adapters to maintain efficiency. Across vision and language tasks, FedDLP achieves higher test accuracy, reduced performance variance, and lower communication/computation costs compared to baselines, demonstrating scalable, personalized FL for large CLIP-like models.

Abstract

Federated Learning (FL) enables collaborative learning across distributed clients while preserving data privacy. However, FL faces significant challenges when dealing with heterogeneous data distributions, which can lead to suboptimal global models that fail to generalize across diverse clients. In this work, we propose a novel framework designed to tackle these challenges by introducing a dual-adapter approach. The method utilizes a larger local adapter for client-specific personalization and a smaller global adapter to facilitate efficient knowledge sharing across clients. Additionally, we incorporate a pruning mechanism to reduce communication overhead by selectively removing less impactful parameters from the local adapter. Through extensive experiments on a range of vision and language tasks, our method demonstrates superior performance compared to existing approaches. It achieves higher test accuracy, lower performance variance among clients, and improved worst-case performance, all while significantly reducing communication and computation costs. Overall, the proposed method addresses the critical trade-off between model personalization and generalization, offering a scalable solution for real-world FL applications.

Federated Multimodal Learning with Dual Adapters and Selective Pruning for Communication and Computational Efficiency

TL;DR

Federated learning with highly non-IID client data struggles to generalize while large multimodal models incur substantial communication costs. The authors propose FedDLP, a dual-adapter framework that attaches a large local LoRA for client-specific personalization and a smaller global LoRA for cross-client knowledge sharing, complemented by SoRA-style pruning to curb transmission and computation. Local and global adapters exchange knowledge through bi-directional distillation and are trained with a combination of cross-entropy and KD losses, while only the global adapters are aggregated server-side; pruning is applied to the local adapters to maintain efficiency. Across vision and language tasks, FedDLP achieves higher test accuracy, reduced performance variance, and lower communication/computation costs compared to baselines, demonstrating scalable, personalized FL for large CLIP-like models.

Abstract

Federated Learning (FL) enables collaborative learning across distributed clients while preserving data privacy. However, FL faces significant challenges when dealing with heterogeneous data distributions, which can lead to suboptimal global models that fail to generalize across diverse clients. In this work, we propose a novel framework designed to tackle these challenges by introducing a dual-adapter approach. The method utilizes a larger local adapter for client-specific personalization and a smaller global adapter to facilitate efficient knowledge sharing across clients. Additionally, we incorporate a pruning mechanism to reduce communication overhead by selectively removing less impactful parameters from the local adapter. Through extensive experiments on a range of vision and language tasks, our method demonstrates superior performance compared to existing approaches. It achieves higher test accuracy, lower performance variance among clients, and improved worst-case performance, all while significantly reducing communication and computation costs. Overall, the proposed method addresses the critical trade-off between model personalization and generalization, offering a scalable solution for real-world FL applications.

Paper Structure

This paper contains 31 sections, 11 equations, 3 figures, 8 tables.

Figures (3)

  • Figure 1: Overview of our proposed method in the federated learning setup. Each client is equipped with a base model consisting of two separate LoRA adapters: a larger, tunable local LoRA adapter (orange) that is kept private to each client and a smaller, tunable global LoRA adapter (green) that is communicated to and from the central server. During training, the global adapter is aggregated across all clients, while the local adapter remains personalized. At inference, the final model is a combination of the frozen base model and personalized local LoRA for each client. The figure is best viewed in color.
  • Figure 2: Comparison of vanilla LoRA and SoRA (both with rank = 2) in pFL on the Flowers102 dataset nilsback2008flowers. Both methods are trained locally without communication. While LoRA initially improves, it suffers from overfitting and performance degradation as training progresses, especially on non-IID data. In contrast, SoRA, with its sparsity mechanism, demonstrates better generalization and maintains higher Top-1 accuracy across rounds, highlighting the effectiveness of structured pruning in handling heterogeneous client data. Both adapeters are applied to the image encoder of CLIP radford2021clip.
  • Figure 3: The overall training scheme within each client. Each client has a CLIP model with two separate LoRA adapters: a larger local LoRA (orange) and a smaller global LoRA (green). During training, the local LoRA is pruned to reduce communication overhead, and the local adapter is updated using cross-entropy (CE) loss while knowledge distillation (Kullback-Leibler (KL) divergence) is performed from the global adapter (Teacher) to the local adapter (Student). The global LoRA, which is shared across clients, is also updated using CE loss and KL divergence from the local adapter. This bi-directional distillation ensures that the global adapter retains generalizable knowledge while the local adapter remains personalized to the client's data. The pruned local adapter and the smaller global adapter are aggregated at the server for global updates. The figure is best viewed in color.