Table of Contents
Fetching ...

SA-ADP: Sensitivity-Aware Adaptive Differential Privacy for Large Language Models

Stella Etuk, Ashraf Matrawy

TL;DR

This work tackles privacy risks in LLM training by moving beyond uniform noise to token-level protection. It introduces SA-ADP, a three-stage pipeline (PII detection, sensitivity scoring, adaptive noise) with per-token Gaussian noise calibrated by a Aggregated Privacy Sensitivity Index and tracked via Rényi DP accounting. Experiments on four diverse datasets show SA-ADP achieves similar utility to No-DP and DP-SGD while substantially lowering the privacy budget, particularly in PII-dense domains. The approach offers a practical, regulation-aware path to privacy-preserving LLM fine-tuning.

Abstract

Despite advances in the use of large language models (LLMs) in downstream tasks, their ability to memorize information has raised privacy concerns. Therefore, protecting personally identifiable information (PII) during LLM training remains a fundamental challenge. Conventional methods like Differential Privacy-Stochastic Gradient Descent (DP-SGD) provide robust privacy protection via uniform noising, protecting PII regardless of its distinct sensitivity. This comes at the expense of the model's utility, leading to a trade-off. In this paper, we propose SA-ADP, a sensitivity-aware approach that allocates noise based on the sensitivity of individual PII. We evaluated our method on four datasets (ABCD, CUSTOMERSIM, Wikitext-2, and UNSW-NB15 ). Our results show that SA-ADP achieves results comparable to the baseline (No-DP) and the conventional DP-SGD. This means that our method did not degrade the model's utility while still maintaining strong privacy protection.

SA-ADP: Sensitivity-Aware Adaptive Differential Privacy for Large Language Models

TL;DR

This work tackles privacy risks in LLM training by moving beyond uniform noise to token-level protection. It introduces SA-ADP, a three-stage pipeline (PII detection, sensitivity scoring, adaptive noise) with per-token Gaussian noise calibrated by a Aggregated Privacy Sensitivity Index and tracked via Rényi DP accounting. Experiments on four diverse datasets show SA-ADP achieves similar utility to No-DP and DP-SGD while substantially lowering the privacy budget, particularly in PII-dense domains. The approach offers a practical, regulation-aware path to privacy-preserving LLM fine-tuning.

Abstract

Despite advances in the use of large language models (LLMs) in downstream tasks, their ability to memorize information has raised privacy concerns. Therefore, protecting personally identifiable information (PII) during LLM training remains a fundamental challenge. Conventional methods like Differential Privacy-Stochastic Gradient Descent (DP-SGD) provide robust privacy protection via uniform noising, protecting PII regardless of its distinct sensitivity. This comes at the expense of the model's utility, leading to a trade-off. In this paper, we propose SA-ADP, a sensitivity-aware approach that allocates noise based on the sensitivity of individual PII. We evaluated our method on four datasets (ABCD, CUSTOMERSIM, Wikitext-2, and UNSW-NB15 ). Our results show that SA-ADP achieves results comparable to the baseline (No-DP) and the conventional DP-SGD. This means that our method did not degrade the model's utility while still maintaining strong privacy protection.

Paper Structure

This paper contains 14 sections, 4 equations, 5 figures, 1 table, 2 algorithms.

Figures (5)

  • Figure 1: Gradient Perturbation in the SA-ADP Framework. At 1, PII are detected from the input data and passed to the scoring engine at 2, where each token is assigned a sensitivity score. An adaptively calibrated noise multiplier is then computed and used at 3 to inject Gaussian noise adaptively into the clipped gradients 4.abadi2016deep during the backward pass, ensuring stronger perturbation for high-sensitivity tokens and minimal distortion for low-sensitivity ones.
  • Figure 2: PII Detection and Sensitivity Scoring. A detection agent processes raw input to identify PII tokens. Each identified token is scored based on frequency, linkability, and datatype parameters. The resulting sensitivity score is used to inform adaptive privacy mechanisms.
  • Figure 3: Accuracy Comparison between SA-ADP and DP-SGD across all three Datasets.
  • Figure 4: Perplexity Comparison between SA-ADP and DP-SGD across all three Datasets. Lower perplexity indicates better fluency.
  • Figure 5: Privacy budget between ADP and DP-SGD across all Datasets.