Noise-Calibrated Inference from Differentially Private Sufficient Statistics in Exponential Families

Amir Asiaee; Samhita Pal

Noise-Calibrated Inference from Differentially Private Sufficient Statistics in Exponential Families

Amir Asiaee, Samhita Pal

TL;DR

A clean and tractable middle ground for exponential families: release only DP sufficient statistics, then perform noise-calibrated likelihood-based inference and optional parametric synthetic data generation as post-processing.

Abstract

Many differentially private (DP) data release systems either output DP synthetic data and leave analysts to perform inference as usual, which can lead to severe miscalibration, or output a DP point estimate without a principled way to do uncertainty quantification. This paper develops a clean and tractable middle ground for exponential families: release only DP sufficient statistics, then perform noise-calibrated likelihood-based inference and optional parametric synthetic data generation as post-processing. Our contributions are: (1) a general recipe for approximate-DP release of clipped sufficient statistics under the Gaussian mechanism; (2) asymptotic normality, explicit variance inflation, and valid Wald-style confidence intervals for the plug-in DP MLE; (3) a noise-aware likelihood correction that is first-order equivalent to the plug-in but supports bootstrap-based intervals; and (4) a matching minimax lower bound showing the privacy distortion rate is unavoidable. The resulting theory yields concrete design rules and a practical pipeline for releasing DP synthetic data with principled uncertainty quantification, validated on three exponential families and real census data.

Noise-Calibrated Inference from Differentially Private Sufficient Statistics in Exponential Families

TL;DR

Abstract

Paper Structure (58 sections, 5 theorems, 18 equations, 9 figures, 1 table, 1 algorithm)

This paper contains 58 sections, 5 theorems, 18 equations, 9 figures, 1 table, 1 algorithm.

Introduction
Contributions and Paper Roadmap
Related Work
DP estimation and minimax rates.
DP inference and confidence intervals.
DP synthetic data generation.
Positioning.
Setup
Exponential Family Models
Differential Privacy and the Gaussian Mechanism
Mechanisms and Estimators
DP Sufficient Statistic Release
Proof sketch.
Plug-in DP MLE and Asymptotic Normality
Proof sketch.
...and 43 more sections

Key Result

Theorem 1

Under Assumption assump:bounded, releasing $\widetilde{\bar{S}}=\bar{S}(D)+Z$ with $Z\sim \mathcal{N}(0,\sigma^2 I_d)$ and $\sigma$ as in Algorithm alg:dp_suffstat is $(\varepsilon,\delta)$-DP. Moreover, any (randomized) post-processing of $\widetilde{\bar{S}}$ (including $\widetilde{\theta}$ and $D

Figures (9)

Figure 1: Pipeline overview. The noisy sufficient statistic $\widetilde{\bar{S}}$ is the only DP-protected release; all downstream tasks inherit the same $(\varepsilon,\delta)$-DP guarantee by post-processing.
Figure 2: Empirical versus theoretical variance for the DP plug-in estimator in Gaussian mean estimation. Each point corresponds to one $(n,\varepsilon)$ configuration, with color indicating $\varepsilon$ and marker shape indicating $n$. The close alignment with the identity line validates the finite-sample relevance of Theorem \ref{['thm:clt']}.
Figure 3: Coverage of 95% intervals across privacy levels for Gaussian, logistic, and Poisson models, each at two sample sizes. The shaded band marks an acceptable calibration range around nominal coverage. Noise-calibrated DP methods remain near nominal while naive synthetic analysis undercovers in the low-$\varepsilon$ regime.
Figure 4: Average confidence-interval length versus $\varepsilon$ for the same settings as Figure \ref{['fig:coverage_vs_epsilon']}. Noise-aware methods are wider at strong privacy (small $\varepsilon$) and contract as $\varepsilon$ increases, reflecting the expected privacy-accuracy trade-off. Naive synthetic intervals remain narrow but are miscalibrated.
Figure 5: Logistic-regression clipping study comparing DP plug-in and DP noise-aware estimators. Left: average absolute bias versus clipping radius $B$. Right: empirical 95% coverage versus $B$. Both methods exhibit a U-shaped bias curve: too-small $B$ causes clipping bias while too-large $B$ increases noise through higher sensitivity. The noise-aware estimator provides no advantage over plug-in, consistent with Proposition \ref{['prop:equiv']}.
...and 4 more figures

Theorems & Definitions (8)

Theorem 1: Privacy of sufficient-statistic release
Theorem 2: Asymptotic distribution and variance inflation
Corollary 1: When does privacy preserve classical efficiency?
Proposition 1: First-order equivalence
Theorem 3: Unavoidable $\Omega(1/(n^2\varepsilon^2))$ MSE lower bound
proof
proof
proof

Noise-Calibrated Inference from Differentially Private Sufficient Statistics in Exponential Families

TL;DR

Abstract

Noise-Calibrated Inference from Differentially Private Sufficient Statistics in Exponential Families

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (8)