Gradient-Free Privacy Leakage in Federated Language Models through Selective Weight Tampering
Md Rafi Ur Rashid, Vishnu Asutosh Dasu, Kang Gu, Najrin Sultana, Shagufta Mehnaz
TL;DR
The paper demonstrates that in federated language-model fine-tuning, intermediate-round model snapshots can leak more private data than the final model, and a malicious FL participant can magnify leakage by selectively tampering with weights responsible for memorizing privacy-sensitive data. It introduces gradient-free attacks, notably SWO and WTL, and a victim-round identification method (VRI) to target specific data; dynamic and server-colluded variants further increase leakage, achieving up to 71% reconstruction and higher membership-inference recall. The work also assesses defenses (differential privacy, pruning/regularization, scrubbing, deduplication) and provides practical guidance for FL clients to defend against such threats with acceptable utility loss. Overall, the study highlights tangible privacy risks in FL-LM deployments and emphasizes the need for co-designed protocols and client-side defenses.
Abstract
Federated learning (FL) has become a key component in various language modeling applications such as machine translation, next-word prediction, and medical record analysis. These applications are trained on datasets from many FL participants that often include privacy-sensitive data, such as healthcare records, phone/credit card numbers, login credentials, etc. Although FL enables computation without necessitating clients to share their raw data, existing works show that privacy leakage is still probable in federated language models. In this paper, we present two novel findings on the leakage of privacy-sensitive user data from federated large language models without requiring access to gradients. Firstly, we make a key observation that model snapshots from the intermediate rounds in FL can cause greater privacy leakage than the final trained model. Secondly, we identify that a malicious FL participant can aggravate the leakage by tampering with the model's selective weights that are responsible for memorizing the sensitive training data of some other clients, even without any cooperation from the server. Our best-performing method increases the membership inference recall by 29% and achieves up to 71% private data reconstruction, evidently outperforming existing attacks that consider much stronger adversary capabilities. Lastly, we recommend a balanced suite of techniques for an FL client to defend against such privacy risk.
