Table of Contents
Fetching ...

Mitigating Unintended Memorization with LoRA in Federated Learning for LLMs

Thierry Bossy, Julien Vignoud, Tahseen Rabbani, Juan R. Troncoso Pastoriza, Martin Jaggi

TL;DR

The paper addresses unintended memorization in FL-trained LLMs and demonstrates that LoRA fine-tuning substantially reduces data regurgitation without sacrificing predictive utility. It provides extensive empirical evidence across centralized and federated settings, leveraging Llama-2/3 and Mistral models on medical QA tasks, and shows LoRA's compatibility with Goldfish loss, gradient clipping, Gaussian noise, secure aggregation, and DP mechanisms. Key findings include up to a 10x reduction in memorization, modest accuracy trade-offs, and dramatic reductions in communication overhead in FL. The work offers practical guidance for privacy-preserving LLM fine-tuning in data-sensitive domains and suggests avenues for theoretical analysis of the mechanisms behind LoRA's memorization mitigation. Overall, LoRA emerges as a lightweight, scalable tool to enhance privacy in FL without compromising performance, especially when combined with complementary privacy techniques.

Abstract

Federated learning (FL) is a popular paradigm for collaborative training which avoids direct data exposure between clients. However, data privacy issues still remain: FL-trained large language models are capable of memorizing and completing phrases and sentences contained in training data when given with their prefixes. Thus, it is possible for adversarial and honest-but-curious clients to recover training data of other participants simply through targeted prompting. In this work, we demonstrate that a popular and simple fine-tuning strategy, low-rank adaptation (LoRA), reduces memorization during FL up to a factor of 10. We study this effect by performing a medical question-answering fine-tuning task and injecting multiple replicas of out-of-distribution sensitive sequences drawn from an external clinical dataset. We observe a reduction in memorization for a wide variety of Llama 2 and 3 models, and find that LoRA can reduce memorization in centralized learning as well. Furthermore, we show that LoRA can be combined with other privacy-preserving techniques such as gradient clipping and Gaussian noising, secure aggregation, and Goldfish loss to further improve record-level privacy while maintaining performance.

Mitigating Unintended Memorization with LoRA in Federated Learning for LLMs

TL;DR

The paper addresses unintended memorization in FL-trained LLMs and demonstrates that LoRA fine-tuning substantially reduces data regurgitation without sacrificing predictive utility. It provides extensive empirical evidence across centralized and federated settings, leveraging Llama-2/3 and Mistral models on medical QA tasks, and shows LoRA's compatibility with Goldfish loss, gradient clipping, Gaussian noise, secure aggregation, and DP mechanisms. Key findings include up to a 10x reduction in memorization, modest accuracy trade-offs, and dramatic reductions in communication overhead in FL. The work offers practical guidance for privacy-preserving LLM fine-tuning in data-sensitive domains and suggests avenues for theoretical analysis of the mechanisms behind LoRA's memorization mitigation. Overall, LoRA emerges as a lightweight, scalable tool to enhance privacy in FL without compromising performance, especially when combined with complementary privacy techniques.

Abstract

Federated learning (FL) is a popular paradigm for collaborative training which avoids direct data exposure between clients. However, data privacy issues still remain: FL-trained large language models are capable of memorizing and completing phrases and sentences contained in training data when given with their prefixes. Thus, it is possible for adversarial and honest-but-curious clients to recover training data of other participants simply through targeted prompting. In this work, we demonstrate that a popular and simple fine-tuning strategy, low-rank adaptation (LoRA), reduces memorization during FL up to a factor of 10. We study this effect by performing a medical question-answering fine-tuning task and injecting multiple replicas of out-of-distribution sensitive sequences drawn from an external clinical dataset. We observe a reduction in memorization for a wide variety of Llama 2 and 3 models, and find that LoRA can reduce memorization in centralized learning as well. Furthermore, we show that LoRA can be combined with other privacy-preserving techniques such as gradient clipping and Gaussian noising, secure aggregation, and Goldfish loss to further improve record-level privacy while maintaining performance.

Paper Structure

This paper contains 33 sections, 1 equation, 10 figures, 9 tables.

Figures (10)

  • Figure 1: Downstream accuracy of centralized learning averaged across the 5 benchmarks. LoRA matches full fine-tuning accuracy on every model tested. We report the out-of-the-box accuracy of the pre-trained models as a control. A breakdown per benchmark is included in Appendix \ref{['sec:aux-acc']}.
  • Figure 2: LoRA vs full fine-tuning memorization scores in centralized learning. LoRA consistently yields lower memorization scores (lower is better). Unless stated otherwise, scores are averaged across prompt lengths. Values are shown when bars are too small. Right-most figures denote the worst-case setting where memorization scores are the highest. Plots (a)-(c) show memorization using exact match rate with no duplication, 10x document duplication, and 10x document duplication with a 500 tokens prompt length, while (d)-(f) use BLEU score.
  • Figure 3: Accuracy vs. privacy across fine-tuning steps. We track accuracy and memorization (BLEU score) during Llama 3.2 3B fine-tuning (10× document duplication) using full fine-tuning (Full FT) and LoRA, compared to the base model. Numbers above data points indicate completed fine-tuning steps.
  • Figure 4: Downstream accuracy in federated learning. LoRA yields relatively similar accuracy to full fine-tuning for several LLMs in a heterogeneous FL setting.
  • Figure 5: Exact match rates of FL and CL. We compare memorization between CL and FL when fine-tuning Llama 3.2 3B.
  • ...and 5 more figures