Table of Contents
Fetching ...

Analysis of Privacy Leakage in Federated Large Language Models

Minh N. Vu, Truc Nguyen, Tre' R. Jeter, My T. Thai

TL;DR

This work designs two active membership inference attacks with guaranteed theoretical success rates to assess the privacy leakages of various adapted FL configurations, revealing substantial privacy vulnerabilities in popular LLMs, including BERT, RoBERTa, DistilBERT, and OpenAI's GPTs, across multiple real-world language datasets.

Abstract

With the rapid adoption of Federated Learning (FL) as the training and tuning protocol for applications utilizing Large Language Models (LLMs), recent research highlights the need for significant modifications to FL to accommodate the large-scale of LLMs. While substantial adjustments to the protocol have been introduced as a response, comprehensive privacy analysis for the adapted FL protocol is currently lacking. To address this gap, our work delves into an extensive examination of the privacy analysis of FL when used for training LLMs, both from theoretical and practical perspectives. In particular, we design two active membership inference attacks with guaranteed theoretical success rates to assess the privacy leakages of various adapted FL configurations. Our theoretical findings are translated into practical attacks, revealing substantial privacy vulnerabilities in popular LLMs, including BERT, RoBERTa, DistilBERT, and OpenAI's GPTs, across multiple real-world language datasets. Additionally, we conduct thorough experiments to evaluate the privacy leakage of these models when data is protected by state-of-the-art differential privacy (DP) mechanisms.

Analysis of Privacy Leakage in Federated Large Language Models

TL;DR

This work designs two active membership inference attacks with guaranteed theoretical success rates to assess the privacy leakages of various adapted FL configurations, revealing substantial privacy vulnerabilities in popular LLMs, including BERT, RoBERTa, DistilBERT, and OpenAI's GPTs, across multiple real-world language datasets.

Abstract

With the rapid adoption of Federated Learning (FL) as the training and tuning protocol for applications utilizing Large Language Models (LLMs), recent research highlights the need for significant modifications to FL to accommodate the large-scale of LLMs. While substantial adjustments to the protocol have been introduced as a response, comprehensive privacy analysis for the adapted FL protocol is currently lacking. To address this gap, our work delves into an extensive examination of the privacy analysis of FL when used for training LLMs, both from theoretical and practical perspectives. In particular, we design two active membership inference attacks with guaranteed theoretical success rates to assess the privacy leakages of various adapted FL configurations. Our theoretical findings are translated into practical attacks, revealing substantial privacy vulnerabilities in popular LLMs, including BERT, RoBERTa, DistilBERT, and OpenAI's GPTs, across multiple real-world language datasets. Additionally, we conduct thorough experiments to evaluate the privacy leakage of these models when data is protected by state-of-the-art differential privacy (DP) mechanisms.
Paper Structure (20 sections, 7 theorems, 41 equations, 6 figures, 11 tables, 3 algorithms)

This paper contains 20 sections, 7 theorems, 41 equations, 6 figures, 11 tables, 3 algorithms.

Key Result

Lemma 1

The advantage of the adversary $\mathcal{A}_{\mathsf{FC}}$ in the security game $\mathsf{Exp}^{\textup{AMI}}$ is 1, i.e., ${\mathbf{\bf Adv}}^{\textup{AMI}}(\mathcal{A}_{\mathsf{FC}}) = 1$. (Proof in Appx. appx:ami_mlp_noldp)

Figures (6)

  • Figure 1: Training/tuning LLMs in FL: The clients typically exchange a light amount of trainable parameters $\theta_i$ while keeping most parameters, i.e., $\theta_p$, frozen.
  • Figure 2: The AMI Threat Model as a Security Game.
  • Figure 3: Different scenarios in training/fine-tuning LLMs in FL. The red squares show the privacy leakage surfaces in the threat model. The white and grey boxes indicate the trainable and frozen weights, respectively.
  • Figure 4: The $\mathcal{A}_{\mathsf{Attn}}$ adversary exploiting self-attention mechanism for membership inference in FL: If the target pattern $v= x_i$ is in the data, the output $z^1_i$ of the filtered head approximates the token's average $\Bar{X}$ instead of approximating $x_i$. This creates non-zero gradients for weights computing the difference between two heads.
  • Figure 5: Simulations of the lower bound (\ref{['eq:theorem_adv_api_first']}) for spherical, Gaussian and one-hot data with $l_X \in \{ 5, 10, 15 \}$ (left). $\beta$ is chosen s.t. the ratios of $\Delta$ over the RHS of (\ref{['eq:delta_cond_first']}) $> 1$, i.e., condition (\ref{['eq:delta_cond_first']}) holds (right).
  • ...and 1 more figures

Theorems & Definitions (18)

  • Definition 1
  • Remark 1
  • Lemma 1
  • Theorem 1
  • Remark 2
  • Definition 2
  • Lemma 2
  • Remark 3
  • Remark 4
  • Remark 5
  • ...and 8 more