Gradient-Free Privacy Leakage in Federated Language Models through Selective Weight Tampering

Md Rafi Ur Rashid; Vishnu Asutosh Dasu; Kang Gu; Najrin Sultana; Shagufta Mehnaz

Gradient-Free Privacy Leakage in Federated Language Models through Selective Weight Tampering

Md Rafi Ur Rashid, Vishnu Asutosh Dasu, Kang Gu, Najrin Sultana, Shagufta Mehnaz

TL;DR

The paper demonstrates that in federated language-model fine-tuning, intermediate-round model snapshots can leak more private data than the final model, and a malicious FL participant can magnify leakage by selectively tampering with weights responsible for memorizing privacy-sensitive data. It introduces gradient-free attacks, notably SWO and WTL, and a victim-round identification method (VRI) to target specific data; dynamic and server-colluded variants further increase leakage, achieving up to 71% reconstruction and higher membership-inference recall. The work also assesses defenses (differential privacy, pruning/regularization, scrubbing, deduplication) and provides practical guidance for FL clients to defend against such threats with acceptable utility loss. Overall, the study highlights tangible privacy risks in FL-LM deployments and emphasizes the need for co-designed protocols and client-side defenses.

Abstract

Federated learning (FL) has become a key component in various language modeling applications such as machine translation, next-word prediction, and medical record analysis. These applications are trained on datasets from many FL participants that often include privacy-sensitive data, such as healthcare records, phone/credit card numbers, login credentials, etc. Although FL enables computation without necessitating clients to share their raw data, existing works show that privacy leakage is still probable in federated language models. In this paper, we present two novel findings on the leakage of privacy-sensitive user data from federated large language models without requiring access to gradients. Firstly, we make a key observation that model snapshots from the intermediate rounds in FL can cause greater privacy leakage than the final trained model. Secondly, we identify that a malicious FL participant can aggravate the leakage by tampering with the model's selective weights that are responsible for memorizing the sensitive training data of some other clients, even without any cooperation from the server. Our best-performing method increases the membership inference recall by 29% and achieves up to 71% private data reconstruction, evidently outperforming existing attacks that consider much stronger adversary capabilities. Lastly, we recommend a balanced suite of techniques for an FL client to defend against such privacy risk.

Gradient-Free Privacy Leakage in Federated Language Models through Selective Weight Tampering

TL;DR

Abstract

Paper Structure (58 sections, 3 equations, 13 figures, 6 tables, 3 algorithms)

This paper contains 58 sections, 3 equations, 13 figures, 6 tables, 3 algorithms.

Introduction
Related Work
Privacy Leakage Attacks in LLMs
Privacy Leakage By Poisoning
Privacy Leakage in Federated Learning
Impact of Privacy-Sensitive Data on Fine-tuning Large Language Models
Attack Methodology
Threat model
Victim Round Identification (VRI)
Maximizing Data Memorization
Optional: Attacks With Server Collusion
Static and Dynamic Modes of Attacks
Static Mode
Dynamic Mode
Evaluating Privacy Leakage
...and 43 more sections

Figures (13)

Figure 1: Norms of weight changes in MLP and self-attention layers of all 12 transformer blocks after fine-tuning GPT-2 with (a) regular English texts and (b) privacy-sensitive texts
Figure 2: (a) Change of exposure of the private data with the passage of 200 FL rounds. The blue dots indicate those rounds when the victim participated. (b)-(d) Norms of the weight changes in the MLP and self-attention layers of all 12 transformer blocks of a victim snapshot, (b) without MDM, (c) with SWO, (d) with WTL
Figure 3: Impact of MDM methods on data memorization and model's utility: (a) for SWO (b) for WTL. These results are generated by finetuning a GPT-2 base model with 1000 plain English texts and 200 out-of-distribution sensitive texts.
Figure 4: Overview of our attack flow. First, we identify the training rounds the victim participated. If the server does not cooperate with the adversary, we use the Victim Round Identification algorithm. Next, we update the victim models to maximize the memorization of sensitive information using Maximizing Data Memorization algorithms. Finally, we prompt the model with crafted prefixes to retrieve sensitive information. More details about the algorithms can be found in Section \ref{['s:methodology']}.
Figure 5: Number of successful reconstructions out of 200 unique canaries for different attack strategies along with the baselines for (a) Wikitext with Gemma, (b) Wikitext with GPT-2, (c) Wikitext with BERT. Then, out of 125 in-house sensitive sequences for (d) Enron with Gemma, (e) Enron with GPT-2, and (f) Enron with BERT. Finally, membership inference (g) recall and (h) precision scores for different attack strategies along with three baselines on the Wikitext Dataset using BERT.
...and 8 more figures

Theorems & Definitions (1)

Definition 1: Differential Privacy dwork_roth_dp

Gradient-Free Privacy Leakage in Federated Language Models through Selective Weight Tampering

TL;DR

Abstract

Gradient-Free Privacy Leakage in Federated Language Models through Selective Weight Tampering

Authors

TL;DR

Abstract

Table of Contents

Figures (13)

Theorems & Definitions (1)