Table of Contents
Fetching ...

Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning

Lynn Chua, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Daogao Liu, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang

TL;DR

This work addresses privacy in language-model fine-tuning by advocating for user-level differential privacy to ensure uniform protection across users, rather than per-record guarantees. It systematically evaluates two mechanisms—Group Privacy and User-wise DP-SGD—and studies data-selection strategies to balance privacy and utility on Enron Email and BookSum using GPT-2 with LoRA. The empirical results show that User-wise DP-SGD generally yields better utility than Group Privacy, especially at stricter privacy budgets, and that data-selection choices significantly influence outcomes; Random Chunk selection in UDPSGD and longest-record selection in Group Privacy emerge as strong baselines depending on the mechanism. The findings offer practical guidance for deploying user-level DP in language-model fine-tuning, while also highlighting limitations of advanced gradient-concentration-based approaches and suggesting directions for future work across dynamic per-user contributions and broader domains.

Abstract

Large language models (LLMs) have emerged as powerful tools for tackling complex tasks across diverse domains, but they also raise privacy concerns when fine-tuned on sensitive data due to potential memorization. While differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit, current evaluations on LLMs mostly treat each example (text record) as the privacy unit. This leads to uneven user privacy guarantees when contributions per user vary. We therefore study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users. We present a systematic evaluation of user-level DP for LLM fine-tuning on natural language generation tasks. Focusing on two mechanisms for achieving user-level DP guarantees, Group Privacy and User-wise DP-SGD, we investigate design choices like data selection strategies and parameter tuning for the best privacy-utility tradeoff.

Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning

TL;DR

This work addresses privacy in language-model fine-tuning by advocating for user-level differential privacy to ensure uniform protection across users, rather than per-record guarantees. It systematically evaluates two mechanisms—Group Privacy and User-wise DP-SGD—and studies data-selection strategies to balance privacy and utility on Enron Email and BookSum using GPT-2 with LoRA. The empirical results show that User-wise DP-SGD generally yields better utility than Group Privacy, especially at stricter privacy budgets, and that data-selection choices significantly influence outcomes; Random Chunk selection in UDPSGD and longest-record selection in Group Privacy emerge as strong baselines depending on the mechanism. The findings offer practical guidance for deploying user-level DP in language-model fine-tuning, while also highlighting limitations of advanced gradient-concentration-based approaches and suggesting directions for future work across dynamic per-user contributions and broader domains.

Abstract

Large language models (LLMs) have emerged as powerful tools for tackling complex tasks across diverse domains, but they also raise privacy concerns when fine-tuned on sensitive data due to potential memorization. While differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit, current evaluations on LLMs mostly treat each example (text record) as the privacy unit. This leads to uneven user privacy guarantees when contributions per user vary. We therefore study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users. We present a systematic evaluation of user-level DP for LLM fine-tuning on natural language generation tasks. Focusing on two mechanisms for achieving user-level DP guarantees, Group Privacy and User-wise DP-SGD, we investigate design choices like data selection strategies and parameter tuning for the best privacy-utility tradeoff.
Paper Structure (34 sections, 1 equation, 7 figures, 12 tables, 2 algorithms)

This paper contains 34 sections, 1 equation, 7 figures, 12 tables, 2 algorithms.

Figures (7)

  • Figure 1: The privacy unit is a crucial parameter for DP guarantees. Previous works commonly treat each training example or record as the unit of privacy (i.e., record-level DP, shown on the left). However, this leads to unequal privacy protection when users contribute varying numbers of records --- users with more records unfortunately obtain weaker privacy guarantees under record-level DP. To address this discrepancy, we consider the stronger requirement of user-level DP (shown on the right) for use cases where it might be necessary or applicable. User-level DP ensures uniform privacy protection across all users, regardless of each individual's number of contributed records.
  • Figure 2: Illustration of how Group Privacy (left) and User-wise DP-SGD (right) preprocess training data, sample records, and calculate gradients at each training step.
  • Figure 3: Effective noise of User-wise DP-SGD and the advanced method proposed by asi2023user under different numbers of records per user ($k$), with $\varepsilon=3.0$. As shown, for asi2023user to yield lower noise than standard User-wise DP-SGD, the ratio between the concentration factor $\tau$ and the clipping norm $C$ must be smaller than 0.1. However, in the evaluated datasets, $\tau/C$ is around 1. \ref{['fig:noise_advanced_full']} presents results for different $\varepsilon$'s.
  • Figure 4: Perplexity of Group Privacy (a) and User-wise DP-SGD (b) on the BookSum dataset with a privacy budget of $\varepsilon=1.0$, under varying clipping norms and batch sizes. Using larger batch sizes generally improves the performance for both methods. However, Group Privacy exhibits higher sensitivity to clipping norm variations compared to User-wise DP-SGD.
  • Figure 5: Distribution of number of records per privacy unit in Enron Email (left) and BookSum (right).
  • ...and 2 more figures