FedMCP: Parameter-Efficient Federated Learning with Model-Contrastive Personalization

Qianyi Zhao; Chen Qu; Cen Chen; Mingyuan Fan; Yanhao Wang

FedMCP: Parameter-Efficient Federated Learning with Model-Contrastive Personalization

Qianyi Zhao, Chen Qu, Cen Chen, Mingyuan Fan, Yanhao Wang

TL;DR

This work tackles privacy-driven NLP fine-tuning in federated learning, where large PLMs impose high communication/computation costs and data/task heterogeneity degrades performance. It introduces FedMCP, which adds a global adapter and a private adapter to a frozen PLM, aggregates only the global adapter across clients, and uses a model-contrastive loss based on central kernel alignment to balance universal and client-specific knowledge. The training objective combines a local cross-entropy on the full model, a regularization term for the global adapter, and the model-contrastive term, promoting generalization and personalization. Experiments on a cross-task, cross-silo GLUE setup show FedMCP outperforming baselines by about 1.5% on average while dramatically reducing trainable parameters and communication, highlighting its practicality for federated NLP with PLMs.

Abstract

With increasing concerns and regulations on data privacy, fine-tuning pretrained language models (PLMs) in federated learning (FL) has become a common paradigm for NLP tasks. Despite being extensively studied, the existing methods for this problem still face two primary challenges. First, the huge number of parameters in large-scale PLMs leads to excessive communication and computational overhead. Second, the heterogeneity of data and tasks across clients poses a significant obstacle to achieving the desired fine-tuning performance. To address the above problems, we propose FedMCP, a novel parameter-efficient fine-tuning method with model-contrastive personalization for FL. Specifically, FedMCP adds two lightweight adapter modules, i.e., the global adapter and the private adapter, to the frozen PLMs within clients. In a communication round, each client sends only the global adapter to the server for federated aggregation. Furthermore, FedMCP introduces a model-contrastive regularization term between the two adapters. This, on the one hand, encourages the global adapter to assimilate universal knowledge and, on the other hand, the private adapter to capture client-specific knowledge. By leveraging both adapters, FedMCP can effectively provide fine-tuned personalized models tailored to individual clients. Extensive experiments on highly heterogeneous cross-task, cross-silo datasets show that FedMCP achieves substantial performance improvements over state-of-the-art FL fine-tuning approaches for PLMs.

FedMCP: Parameter-Efficient Federated Learning with Model-Contrastive Personalization

TL;DR

Abstract

Paper Structure (18 sections, 10 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 18 sections, 10 equations, 5 figures, 2 tables, 1 algorithm.

Introduction
Related Work
(Personalized) Federated Learning
Federated Learning for NLP
Preliminaries
Our Method
Overview
Model Architecture
Global Adapter Learning
Model-Contrastive Personalization
Local Training and Global Aggregation
Experiments
Dataset Construction
Baselines
Hyperparameters and Implementation
...and 3 more sections

Figures (5)

Figure 1: Comparison of FedAvg with PEFT and FedMCP, where $\mathcal{A}$ and $\mathcal{B}$ refer to the adapter and backbone modules, respectively, and the snowflake icon indicates that the backbone is frozen, with only the adapters trainable.
Figure 2: Overview of the FedMCP method. (a) Federated model-contrastive personalization workflow; (b) Overall model structure; (c) Detailed structure of the two adapters and BERT blocks.
Figure 3: Comparison of FedMCP and FedAvg (PEFT) for the average and standard deviation of accuracy during 25 communication rounds in six clients.
Figure 4: Effect of similarity metric (CKA vs. cosine similarity) used in model-contrastive personalization on the performance of FedMCP.
Figure 5: Effect of sentence representation ([CLS] token vs. average pooling) used in model-contrastive personalization on the performance of FedMCP.

FedMCP: Parameter-Efficient Federated Learning with Model-Contrastive Personalization

TL;DR

Abstract

FedMCP: Parameter-Efficient Federated Learning with Model-Contrastive Personalization

Authors

TL;DR

Abstract

Table of Contents

Figures (5)