Differentially Private Empirical Cumulative Distribution Functions
Antoine Barczewski, Amal Mawass, Jan Ramon
TL;DR
The paper addresses private computation of empirical CDFs in a federated setting by proposing a DP mechanism that publishes a complete ECDF with logarithmic noise growth, along with a smoothing step to enforce monotonicity. It develops two computational pathways—a generic secure aggregation approach and a function secret sharing (FSS) approach—to evaluate ECDFs and their inverses, and demonstrates DP guarantees with explicit error bounds. The authors apply these techniques to two impactful metrics, DP ROC curves and DP Hosmer-Lemeshow calibration, and validate performance through experiments on synthetic and real-world datasets, including analyses of smoothing effects and runtime. The work provides a modular framework for private distributional statistics in collaborative settings, enabling more informative privacy-preserving analyses and broader applicability beyond simple DP summaries.
Abstract
In order to both learn and protect sensitive training data, there has been a growing interest in privacy preserving machine learning methods. Differential privacy has emerged as an important measure of privacy. We are interested in the federated setting where a group of parties each have one or more training instances and want to learn collaboratively without revealing their data. In this paper, we propose strategies to compute differentially private empirical distribution functions. While revealing complete functions is more expensive from the point of view of privacy budget, it may also provide richer and more valuable information to the learner. We prove privacy guarantees and discuss the computational cost, both for a generic strategy fitting any security model and a special-purpose strategy based on secret sharing. We survey a number of applications and present experiments.
