Table of Contents
Fetching ...

LanFL: Differentially Private Federated Learning with Large Language Models using Synthetic Samples

Huiyu Wu, Diego Klabjan

TL;DR

This paper introduces a novel FL scheme for LLMs, named LanFL, which is purely prompt-based and treats the underlying LLMs as black boxes, and develops a differentially private synthetic sample generation mechanism to facilitate knowledge sharing among participants, along with a prompt optimization scheme that enables learning from synthetic samples.

Abstract

Federated Learning (FL) is a collaborative, privacy-preserving machine learning framework that enables multiple participants to train a single global model. However, the recent advent of powerful Large Language Models (LLMs) with tens to hundreds of billions of parameters makes the naive application of traditional FL methods to LLMs impractical due to high computational and communication costs. Furthermore, end users of LLMs often lack access to full architectures and weights of the models, making it impossible for participants to fine-tune these models directly. This paper introduces a novel FL scheme for LLMs, named LanFL, which is purely prompt-based and treats the underlying LLMs as black boxes. We have developed a differentially private synthetic sample generation mechanism to facilitate knowledge sharing among participants, along with a prompt optimization scheme that enables learning from synthetic samples. Our extensive experiments demonstrate that LanFL successfully facilitates learning among participants while preserving the privacy of local datasets across various tasks.

LanFL: Differentially Private Federated Learning with Large Language Models using Synthetic Samples

TL;DR

This paper introduces a novel FL scheme for LLMs, named LanFL, which is purely prompt-based and treats the underlying LLMs as black boxes, and develops a differentially private synthetic sample generation mechanism to facilitate knowledge sharing among participants, along with a prompt optimization scheme that enables learning from synthetic samples.

Abstract

Federated Learning (FL) is a collaborative, privacy-preserving machine learning framework that enables multiple participants to train a single global model. However, the recent advent of powerful Large Language Models (LLMs) with tens to hundreds of billions of parameters makes the naive application of traditional FL methods to LLMs impractical due to high computational and communication costs. Furthermore, end users of LLMs often lack access to full architectures and weights of the models, making it impossible for participants to fine-tune these models directly. This paper introduces a novel FL scheme for LLMs, named LanFL, which is purely prompt-based and treats the underlying LLMs as black boxes. We have developed a differentially private synthetic sample generation mechanism to facilitate knowledge sharing among participants, along with a prompt optimization scheme that enables learning from synthetic samples. Our extensive experiments demonstrate that LanFL successfully facilitates learning among participants while preserving the privacy of local datasets across various tasks.

Paper Structure

This paper contains 12 sections, 2 theorems, 10 equations, 3 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

Mechanism mechanism is $(\delta, \epsilon)$ differentially private for $\delta=\frac{k}{N+1}$ and $\epsilon=0$. Specifically, for any two data sets $D_1, D_2$ in the domain of $f$ that differ by only one element by addition or deletion, and any subset $S$ in the range of $f$ we have

Figures (3)

  • Figure 1: LanFL Operations. Step 1: Clients generate synthetic data sets by prompting LLMs using local data sets. Step 2: Clients share the synthetic data sets among themselves. Step 3: Clients learn the best prompt utilizing the synthetic data sets received.
  • Figure 2: Example prompt used to generate synthetic sample (Output of $M_{syn}$)
  • Figure 3: BLEU scores distributions for synthetic samples and paraphrased training samples

Theorems & Definitions (2)

  • Theorem 1
  • Corollary 1