Table of Contents
Fetching ...

A General Framework for Per-record Differential Privacy

Xinghe Chen, Dajun Sun, Quanqing Xu, Wei Dong

TL;DR

This work presents the first general framework for per-record differential privacy (PrDP), enabling any standard DP or LDP mechanism to provide privacy guarantees that depend on a per-record budget function $\mathcal{E}(\cdot)$. The key technical contribution is privacy-specified domain partitioning, which privately estimates $\varepsilon_{\min}(D)$ and guides domain-wise processing so that utility scales with the instance-specific privacy need. The framework extends to per-record local DP (PrLDP) via privacy-specified query augmentation, and the authors show how to obtain PrDP solutions for fundamental tasks like counting, sum, and max, with significant utility gains over existing PDP methods. Experiments on real and synthetic datasets demonstrate substantial reductions in error (up to 165x) and favorable runtimes, validating the practicality of PrDP for diverse queries and privacy functions. The work lays a foundation for broad adoption of per-record privacy in private data analysis, with future directions including dynamic data, streaming queries, and shuffle-DP adaptations.

Abstract

Differential Privacy (DP) is a widely adopted standard for privacy-preserving data analysis, but it assumes a uniform privacy budget across all records, limiting its applicability when privacy requirements vary with data values. Per-record Differential Privacy (PrDP) addresses this by defining the privacy budget as a function of each record, offering better alignment with real-world needs. However, the dependency between the privacy budget and the data value introduces challenges in protecting the budget's privacy itself. Existing solutions either handle specific privacy functions or adopt relaxed PrDP definitions. A simple workaround is to use the global minimum of the privacy function, but this severely degrades utility, as the minimum is often set extremely low to account for rare records with high privacy needs. In this work, we propose a general and practical framework that enables any standard DP mechanism to support PrDP, with error depending only on the minimal privacy requirement among records actually present in the dataset. Since directly revealing this minimum may leak information, we introduce a core technique called privacy-specified domain partitioning, which ensures accurate estimation without compromising privacy. We also extend our framework to the local DP setting via a novel technique, privacy-specified query augmentation. Using our framework, we present the first PrDP solutions for fundamental tasks such as count, sum, and maximum estimation. Experimental results show that our mechanisms achieve high utility and significantly outperform existing Personalized DP (PDP) methods, which can be viewed as a special case of PrDP with relaxed privacy protection.

A General Framework for Per-record Differential Privacy

TL;DR

This work presents the first general framework for per-record differential privacy (PrDP), enabling any standard DP or LDP mechanism to provide privacy guarantees that depend on a per-record budget function . The key technical contribution is privacy-specified domain partitioning, which privately estimates and guides domain-wise processing so that utility scales with the instance-specific privacy need. The framework extends to per-record local DP (PrLDP) via privacy-specified query augmentation, and the authors show how to obtain PrDP solutions for fundamental tasks like counting, sum, and max, with significant utility gains over existing PDP methods. Experiments on real and synthetic datasets demonstrate substantial reductions in error (up to 165x) and favorable runtimes, validating the practicality of PrDP for diverse queries and privacy functions. The work lays a foundation for broad adoption of per-record privacy in private data analysis, with future directions including dynamic data, streaming queries, and shuffle-DP adaptations.

Abstract

Differential Privacy (DP) is a widely adopted standard for privacy-preserving data analysis, but it assumes a uniform privacy budget across all records, limiting its applicability when privacy requirements vary with data values. Per-record Differential Privacy (PrDP) addresses this by defining the privacy budget as a function of each record, offering better alignment with real-world needs. However, the dependency between the privacy budget and the data value introduces challenges in protecting the budget's privacy itself. Existing solutions either handle specific privacy functions or adopt relaxed PrDP definitions. A simple workaround is to use the global minimum of the privacy function, but this severely degrades utility, as the minimum is often set extremely low to account for rare records with high privacy needs. In this work, we propose a general and practical framework that enables any standard DP mechanism to support PrDP, with error depending only on the minimal privacy requirement among records actually present in the dataset. Since directly revealing this minimum may leak information, we introduce a core technique called privacy-specified domain partitioning, which ensures accurate estimation without compromising privacy. We also extend our framework to the local DP setting via a novel technique, privacy-specified query augmentation. Using our framework, we present the first PrDP solutions for fundamental tasks such as count, sum, and maximum estimation. Experimental results show that our mechanisms achieve high utility and significantly outperform existing Personalized DP (PDP) methods, which can be viewed as a special case of PrDP with relaxed privacy protection.

Paper Structure

This paper contains 42 sections, 11 theorems, 35 equations, 5 figures, 4 tables, 5 algorithms.

Key Result

lemma 1

(Laplace Mechanism). Given a query $Q : [U]^{n\times d} \to \mathbb{R}$, the Laplace mechanism satisfies $\varepsilon$-DP, where $\eta$ is drawn from a Laplace distribution with scale $\text{GS}_Q/\varepsilon$, i.e., $\eta \sim \text{Lap} \left( \text{GS}_Q/\varepsilon \right)$.

Figures (5)

  • Figure 1: An illustration of Algorithm \ref{['alg:cnt']} for a counting problem with the privacy budget function $\mathcal{E}(r) = \alpha / v_{\text{bal}}$, where $\alpha = 10^4$, $U=1,280,000,000$ $, and $\hat{\varepsilon} = 4.096$.
  • Figure 2: Illustration of the workflow of Algorithm \ref{['alg:gfra']}.
  • Figure 3: Comparison of errors in sum estimation under PrDP across different mechanisms, with varying dataset sizes $n$. All aixs are in log scale.
  • Figure 4: Comparison of errors in sum estimation under PrDP across different mechanisms, with varying $U$. All aixs are in log scale.
  • Figure 5: Comparison of errors in sum estimation under PrDP across different mechanisms, with varying $\sigma$. All aixs are in log scale.

Theorems & Definitions (18)

  • definition 1
  • lemma 1
  • lemma 2
  • definition 2
  • theorem 1: fang2022shifted
  • definition 3
  • definition 4
  • definition 5
  • lemma 3
  • lemma 4
  • ...and 8 more