Privacy Amplification for Synthetic data using Range Restriction
Monika Hu, Matthew R. Williams, Terrance D. Savitsky
TL;DR
The paper proposes range-restricted privacy standards for synthetic data by conditioning the risk-weighted pseudo posterior mechanism on owner-defined sensitive ranges, enabling privacy amplification by restricting protection to a subspace of values. It formalizes two approaches—range-averaged privacy, which uses distributional information within the sensitive range, and range-truncated privacy, which relies on range endpoints—leading to per-record adjustments that tighten the Lipschitz sensitivity in the asymptotic DP regime. Through simulations and an accelerated life testing application, the authors show that these range-restricted schemes can achieve stronger privacy for the same budget and, in many settings, improve utility, with tunable trade-offs via the width of the sensitive range and tail assignments. The framework generalizes to a unifying γ-based formulation, connects to aDP and Pufferfish-style privacy notions, and offers practical flexibility for data disseminators to tailor protection to subsets of the data while preserving utility. Overall, the work demonstrates concrete pathways to amplify privacy by incorporating publicly known information about sensitive ranges into model-based synthetic data generation.
Abstract
We introduce a new class of range restricted formal data privacy standards that condition on owner beliefs about sensitive data ranges. By incorporating this additional information, we can provide a stronger privacy guarantee (e.g. an amplification). The range restricted formal privacy standards protect only a subset (or ball) of data values and exclude ranges (or balls) believed to be already publicly known. The privacy standards are designed for the risk-weighted pseudo posterior (model) mechanism (PPM) used to generate synthetic data under an asymptotic Differential (aDP) privacy guarantee. The PPM downweights the likelihood contribution for each record proportionally to its disclosure risk. The PPM is adapted under inclusion of beliefs by adjusting the risk-weighted pseudo likelihood. We introduce two alternative adjustments. The first expresses data owner knowledge of the sensitive range as a probability, $λ$, that a datum value drawn from the underlying generating distribution lies outside the ball or subspace of values that are sensitive. The portion of each datum likelihood contribution deemed sensitive is then $(1-λ) \leq 1$ and is the only portion of the likelihood subject to risk down-weighting. The second adjustment encodes knowledge as the difference in probability masses $P(R) \leq 1$ between the edges of the sensitive range, $R$. We use the resulting conditional (pseudo) likelihood for a sensitive record, which boosts its worst case tail values away from 0. We compare privacy and utility properties for the PPM under the aDP and range restricted privacy standards.
