Table of Contents
Fetching ...

AlignDP: Hybrid Differential Privacy with Rarity-Aware Protection for LLMs

Madhava Gaikwad

TL;DR

AlignDP presents a proactive, rarity-aware privacy lock for LLM telemetry that partitions data into rare and non-rare events and applies PAC indistinguishability to rare events while privatizing non-rare events with RAPPOR. A global aggregator enforces budgeting and DP-based composition to bound leakage, addressing the limitations of post hoc defenses. The work provides theoretical guarantees for both components, including a zero-$\epsilon$ local DP guarantee for rare events, an unbiased, debiased estimator under $\epsilon$-LDP for non-rare events, and standard DP composition bounds. A toy operational illustration validates feasibility, showing that rare signals remain hidden while frequent patterns are recoverable with controllable noise, yielding useful aggregate statistics with limited leakage.

Abstract

Large language models are exposed to risks of extraction, distillation, and unauthorized fine-tuning. Existing defenses use watermarking or monitoring, but these act after leakage. We design AlignDP, a hybrid privacy lock that blocks knowledge transfer at the data interface. The key idea is to separate rare and non-rare fields. Rare fields are shielded by PAC indistinguishability, giving effective zero-epsilon local DP. Non-rare fields are privatized with RAPPOR, giving unbiased frequency estimates under local DP. A global aggregator enforces composition and budget. This two-tier design hides rare events and adds controlled noise to frequent events. We prove limits of PAC extension to global aggregation, give bounds for RAPPOR estimates, and analyze utility trade-off. A toy simulation confirms feasibility: rare categories remain hidden, frequent categories are recovered with small error.

AlignDP: Hybrid Differential Privacy with Rarity-Aware Protection for LLMs

TL;DR

AlignDP presents a proactive, rarity-aware privacy lock for LLM telemetry that partitions data into rare and non-rare events and applies PAC indistinguishability to rare events while privatizing non-rare events with RAPPOR. A global aggregator enforces budgeting and DP-based composition to bound leakage, addressing the limitations of post hoc defenses. The work provides theoretical guarantees for both components, including a zero- local DP guarantee for rare events, an unbiased, debiased estimator under -LDP for non-rare events, and standard DP composition bounds. A toy operational illustration validates feasibility, showing that rare signals remain hidden while frequent patterns are recoverable with controllable noise, yielding useful aggregate statistics with limited leakage.

Abstract

Large language models are exposed to risks of extraction, distillation, and unauthorized fine-tuning. Existing defenses use watermarking or monitoring, but these act after leakage. We design AlignDP, a hybrid privacy lock that blocks knowledge transfer at the data interface. The key idea is to separate rare and non-rare fields. Rare fields are shielded by PAC indistinguishability, giving effective zero-epsilon local DP. Non-rare fields are privatized with RAPPOR, giving unbiased frequency estimates under local DP. A global aggregator enforces composition and budget. This two-tier design hides rare events and adds controlled noise to frequent events. We prove limits of PAC extension to global aggregation, give bounds for RAPPOR estimates, and analyze utility trade-off. A toy simulation confirms feasibility: rare categories remain hidden, frequent categories are recovered with small error.

Paper Structure

This paper contains 28 sections, 15 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: AlignDP pipeline. Rare events shielded by PAC, non-rare privatized with RAPPOR, aggregator enforces budget before LLM access.
  • Figure 2: AlignDP experimental validation. (a) Frequency recovery. (b) Error decay with sample size. (c) Resistance to extraction with repeated queries. (d) PAC bound validation.