Table of Contents
Fetching ...

FedDTPT: Federated Discrete and Transferable Prompt Tuning for Black-Box Large Language Models

Jiaqi Wu, Simin Chen, Yuzhe Yang, Yijiang Li, Shiyue Hou, Rui Jing, Zehua Wang, Wei Chen, Zijian Tian

TL;DR

This work proposes for the first time a federated discrete and transferable prompt tuning, namely FedDTPT, for black-box large language models, and demonstrates that this approach achieves higher accuracy, reduced communication overhead, and robustness to non-iid data in a black-box setting.

Abstract

In recent years, large language models (LLMs) have significantly advanced the field of natural language processing (NLP). By fine-tuning LLMs with data from specific scenarios, these foundation models can better adapt to various downstream tasks. However, the fine-tuning process poses privacy leakage risks, particularly in centralized data processing scenarios. To address user privacy concerns, federated learning (FL) has been introduced to mitigate the risks associated with centralized data collection from multiple sources. Nevertheless, the privacy of LLMs themselves is equally critical, as potential malicious attacks challenge their security, an issue that has received limited attention in current research. Consequently, establishing a trusted multi-party model fine-tuning environment is essential. Additionally, the local deployment of large LLMs incurs significant storage costs and high computational demands. To address these challenges, we propose for the first time a federated discrete and transferable prompt tuning, namely FedDTPT, for black-box large language models. In the client optimization phase, we adopt a token-level discrete prompt optimization method that leverages a feedback loop based on prediction accuracy to drive gradient-free prompt optimization through the MLM API. For server optimization, we employ an attention mechanism based on semantic similarity to filter all local prompt tokens, along with an embedding distance elbow detection and DBSCAN clustering strategy to enhance the filtering process. Experimental results demonstrate that, compared to state-of-the-art methods, our approach achieves higher accuracy, reduced communication overhead, and robustness to non-iid data in a black-box setting. Moreover, the optimized prompts are transferable.

FedDTPT: Federated Discrete and Transferable Prompt Tuning for Black-Box Large Language Models

TL;DR

This work proposes for the first time a federated discrete and transferable prompt tuning, namely FedDTPT, for black-box large language models, and demonstrates that this approach achieves higher accuracy, reduced communication overhead, and robustness to non-iid data in a black-box setting.

Abstract

In recent years, large language models (LLMs) have significantly advanced the field of natural language processing (NLP). By fine-tuning LLMs with data from specific scenarios, these foundation models can better adapt to various downstream tasks. However, the fine-tuning process poses privacy leakage risks, particularly in centralized data processing scenarios. To address user privacy concerns, federated learning (FL) has been introduced to mitigate the risks associated with centralized data collection from multiple sources. Nevertheless, the privacy of LLMs themselves is equally critical, as potential malicious attacks challenge their security, an issue that has received limited attention in current research. Consequently, establishing a trusted multi-party model fine-tuning environment is essential. Additionally, the local deployment of large LLMs incurs significant storage costs and high computational demands. To address these challenges, we propose for the first time a federated discrete and transferable prompt tuning, namely FedDTPT, for black-box large language models. In the client optimization phase, we adopt a token-level discrete prompt optimization method that leverages a feedback loop based on prediction accuracy to drive gradient-free prompt optimization through the MLM API. For server optimization, we employ an attention mechanism based on semantic similarity to filter all local prompt tokens, along with an embedding distance elbow detection and DBSCAN clustering strategy to enhance the filtering process. Experimental results demonstrate that, compared to state-of-the-art methods, our approach achieves higher accuracy, reduced communication overhead, and robustness to non-iid data in a black-box setting. Moreover, the optimized prompts are transferable.

Paper Structure

This paper contains 14 sections, 6 equations, 2 figures, 6 tables, 2 algorithms.

Figures (2)

  • Figure 1: The structure of FedDTPT. The client uses prediction results as feedback to drive the MLM API for discrete prompt optimization. The locally optimized prompts are then uploaded to the server, where tokens are mapped to a high-dimensional latent space. Similarity calculations on these high-dimensional embeddings yield weight values $W$, and a clustering strategy is applied to select high-weight tokens. These tokens are then combined to form a global prompt, which is subsequently distributed back to the clients.
  • Figure 2: The accuracy of FedDTPT under different seed