Table of Contents
Fetching ...

FedBPT: Efficient Federated Black-box Prompt Tuning for Large Language Models

Jingwei Sun, Ziyue Xu, Hongxu Yin, Dong Yang, Daguang Xu, Yiran Chen, Holger R. Roth

TL;DR

FedBPT presents a privacy-preserving approach to adapting large pre-trained language models in federated settings by performing gradient-free, black-box prompt tuning. By exchanging only low-dimensional prompt distributions and using server-level CMA-ES for aggregation, the framework avoids direct access to PLM parameters and backpropagation, reducing communication, compute, and memory demands. A perturbation-based regularization mechanism mitigates local overfitting on non-IID client data. Empirical results across SST-2, AG's News, Yelp with RoBERTa and Llama 2 demonstrate that FedBPT achieves competitive accuracy while delivering orders-of-magnitude reductions in communication and memory usage, highlighting its practicality for privacy-preserving PLM fine-tuning in the age of large language models.

Abstract

Pre-trained language models (PLM) have revolutionized the NLP landscape, achieving stellar performances across diverse tasks. These models, while benefiting from vast training data, often require fine-tuning on specific data to cater to distinct downstream tasks. However, this data adaptation process has inherent security and privacy concerns, primarily when leveraging user-generated, device-residing data. Federated learning (FL) provides a solution, allowing collaborative model fine-tuning without centralized data collection. However, applying FL to finetune PLMs is hampered by challenges, including restricted model parameter access, high computational requirements, and communication overheads. This paper introduces Federated Black-box Prompt Tuning (FedBPT), a framework designed to address these challenges. FedBPT does not require the clients to access the model parameters. By focusing on training optimal prompts and utilizing gradient-free optimization methods, FedBPT reduces the number of exchanged variables, boosts communication efficiency, and minimizes computational and storage costs. Experiments highlight the framework's ability to drastically cut communication and memory costs while maintaining competitive performance. Ultimately, FedBPT presents a promising solution for efficient, privacy-preserving fine-tuning of PLM in the age of large language models.

FedBPT: Efficient Federated Black-box Prompt Tuning for Large Language Models

TL;DR

FedBPT presents a privacy-preserving approach to adapting large pre-trained language models in federated settings by performing gradient-free, black-box prompt tuning. By exchanging only low-dimensional prompt distributions and using server-level CMA-ES for aggregation, the framework avoids direct access to PLM parameters and backpropagation, reducing communication, compute, and memory demands. A perturbation-based regularization mechanism mitigates local overfitting on non-IID client data. Empirical results across SST-2, AG's News, Yelp with RoBERTa and Llama 2 demonstrate that FedBPT achieves competitive accuracy while delivering orders-of-magnitude reductions in communication and memory usage, highlighting its practicality for privacy-preserving PLM fine-tuning in the age of large language models.

Abstract

Pre-trained language models (PLM) have revolutionized the NLP landscape, achieving stellar performances across diverse tasks. These models, while benefiting from vast training data, often require fine-tuning on specific data to cater to distinct downstream tasks. However, this data adaptation process has inherent security and privacy concerns, primarily when leveraging user-generated, device-residing data. Federated learning (FL) provides a solution, allowing collaborative model fine-tuning without centralized data collection. However, applying FL to finetune PLMs is hampered by challenges, including restricted model parameter access, high computational requirements, and communication overheads. This paper introduces Federated Black-box Prompt Tuning (FedBPT), a framework designed to address these challenges. FedBPT does not require the clients to access the model parameters. By focusing on training optimal prompts and utilizing gradient-free optimization methods, FedBPT reduces the number of exchanged variables, boosts communication efficiency, and minimizes computational and storage costs. Experiments highlight the framework's ability to drastically cut communication and memory costs while maintaining competitive performance. Ultimately, FedBPT presents a promising solution for efficient, privacy-preserving fine-tuning of PLM in the age of large language models.
Paper Structure (25 sections, 17 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 25 sections, 17 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overview of FedBPT. The clients in FedBPT adopt a gradient-free optimization (CMA-ES) to search for optimal distributions of the prompt based on local data. The clients are not required to access the PLM parameters, and only inference of the PLM is conducted during the search. The server aggregates the uploaded local distributions to derive the globally optimal distribution of the prompt. The global distribution will be sent back to the clients for the next round of search.
  • Figure 2: Comparison of aggregation between directly using FedAvg and FedBPT. FedAvg derives the global distribution by directly averaging the local distribution statistics. In FedBPT, the server applies CMA-ES to derive the global prompt distributions with the awareness of the evaluation results of the uploaded local distributions.
  • Figure 3: Confusion matrix of a client holding data that more than 90% is in class one.
  • Figure 4: We randomly mask and replace the tokens to perturb a sentence. The PLM should be confused about the perturbed sentence even given an optimal prompt.
  • Figure 5: The results under IID and non-IID settings with Llama 2 as the backbone model.
  • ...and 1 more figures