On Improving the Composition Privacy Loss in Differential Privacy for Fixed Estimation Error

V. Arvind Rameshwar; Anshoo Tandon

On Improving the Composition Privacy Loss in Differential Privacy for Fixed Estimation Error

V. Arvind Rameshwar, Anshoo Tandon

TL;DR

This work addresses privately releasing means and variances across multiple disjoint grids under user-level differential privacy with heterogeneous user contributions. It introduces exact analytical characterizations of mean/variance sensitivities and worst-case clipping biases, and proposes an iterative Clip-User procedure to suppress user contributions and reduce privacy loss under composition without increasing the worst-case estimation error. The authors also provide a post-processing technique based on pseudo-user creation to further reduce grid-wise error, and validate the approach on real Intelligent Traffic Management System data and synthetic datasets, showing notable privacy-preservation gains with controlled accuracy costs. The methodology is applicable to a broad class of grid-based private statistics and offers practical guidance for tracking privacy loss in multi-query, multi-grid settings, with potential extensions to approximate DP and other statistics.

Abstract

This paper considers the private release of statistics of disjoint subsets of a dataset, in the setting of data heterogeneity, where users could contribute more than one sample, with different users contributing potentially different numbers of samples. In particular, we focus on the $ε$-differentially private release of sample means and variances of sample values in disjoint subsets of a dataset, under the assumption that the numbers of contributions of each user in each subset is publicly known. Our main contribution is an iterative algorithm, based on suppressing user contributions, which seeks to reduce the overall privacy loss degradation under a canonical Laplace mechanism, while not increasing the worst estimation error among the subsets. Important components of this analysis are our exact, analytical characterizations of the sensitivities and the worst-case bias errors of estimators of the sample mean and variance, which are obtained by clipping or suppressing user contributions. We test the performance of our algorithm on real-world and synthetic datasets and demonstrate clear improvements in the privacy loss degradation, for fixed worst-case estimation error.

On Improving the Composition Privacy Loss in Differential Privacy for Fixed Estimation Error

TL;DR

Abstract

-differentially private release of sample means and variances of sample values in disjoint subsets of a dataset, under the assumption that the numbers of contributions of each user in each subset is publicly known. Our main contribution is an iterative algorithm, based on suppressing user contributions, which seeks to reduce the overall privacy loss degradation under a canonical Laplace mechanism, while not increasing the worst estimation error among the subsets. Important components of this analysis are our exact, analytical characterizations of the sensitivities and the worst-case bias errors of estimators of the sample mean and variance, which are obtained by clipping or suppressing user contributions. We test the performance of our algorithm on real-world and synthetic datasets and demonstrate clear improvements in the privacy loss degradation, for fixed worst-case estimation error.

Paper Structure (32 sections, 19 theorems, 93 equations, 8 figures, 1 algorithm)

This paper contains 32 sections, 19 theorems, 93 equations, 8 figures, 1 algorithm.

Introduction
Comparison with related work and contributions
Organization of material
Preliminaries
Notation
Motivation and Problem Setup
Problem Formulation
User-Level Differential Privacy
Composition of User-Level DP Mechanisms
Mechanisms for Releasing DP Estimates
Baseline
User-Level sensitivities of $\mu$ and $\textsf{Var}$
Clip
Worst-Case Errors in Estimation of Sample Mean and Variance
An Error Metric and an Algorithm for Controlling Privacy Loss
...and 17 more sections

Key Result

Theorem 2.1

For a function $\theta: \mathsf{D}\to \mathbb{R}^d$, the mechanism $M^{\text{Lap}}: \mathsf{D}\to \mathbb{R}^d$ defined by where $Z = (Z_1,\ldots,Z_d)$ is such that $Z_i\sim \text{Lap}(\Delta_\theta/\epsilon)$, is user-level $\epsilon$-DP.

Figures (8)

Figure 1: Plot of privacy loss under composition ${P}_\epsilon$ after execution of Clip-User on the real-world ITMS dataset, against the original privacy loss $G_1\epsilon = 11\epsilon$.
Figure 2: Plot of worst-case error $\overline{E}_\epsilon$ after execution of Clip-User and the implementation of the pseudo-user creation-based clipping strategy on the real-world ITMS dataset, against the original worst-case error $E_\epsilon$.
Figure 3: Plot of estimate $\widehat{P}_\epsilon$ of expected privacy loss, after execution of Clip-User on a synthetic dataset, against the original privacy loss $G\epsilon$. Here, $\gamma = 3$ and $q = 0.01$.
Figure 4: Plot of estimate $\widehat{P}_\epsilon$ of expected privacy loss, after execution of Clip-User on a synthetic dataset, against the original privacy loss $G\epsilon$. Here, $\gamma = 6$ and $q = 0.01$.
Figure 5: Plot of estimate $\widehat{P}_\epsilon$ of expected privacy loss, after execution of Clip-User on a synthetic dataset, against the original privacy loss $G\epsilon$. Here, $\gamma = 9$ and $q = 0.01$.
...and 3 more figures

Theorems & Definitions (30)

Definition 2.1
Definition 2.2
Theorem 2.1
Proposition 2.1
Theorem 2.2: Basic Composition Theorem
Theorem 2.3
proof
Corollary 2.1
Proposition 3.1
Corollary 3.1
...and 20 more

On Improving the Composition Privacy Loss in Differential Privacy for Fixed Estimation Error

TL;DR

Abstract

On Improving the Composition Privacy Loss in Differential Privacy for Fixed Estimation Error

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (30)