Table of Contents
Fetching ...

Differentially Private Empirical Cumulative Distribution Functions

Antoine Barczewski, Amal Mawass, Jan Ramon

TL;DR

The paper addresses private computation of empirical CDFs in a federated setting by proposing a DP mechanism that publishes a complete ECDF with logarithmic noise growth, along with a smoothing step to enforce monotonicity. It develops two computational pathways—a generic secure aggregation approach and a function secret sharing (FSS) approach—to evaluate ECDFs and their inverses, and demonstrates DP guarantees with explicit error bounds. The authors apply these techniques to two impactful metrics, DP ROC curves and DP Hosmer-Lemeshow calibration, and validate performance through experiments on synthetic and real-world datasets, including analyses of smoothing effects and runtime. The work provides a modular framework for private distributional statistics in collaborative settings, enabling more informative privacy-preserving analyses and broader applicability beyond simple DP summaries.

Abstract

In order to both learn and protect sensitive training data, there has been a growing interest in privacy preserving machine learning methods. Differential privacy has emerged as an important measure of privacy. We are interested in the federated setting where a group of parties each have one or more training instances and want to learn collaboratively without revealing their data. In this paper, we propose strategies to compute differentially private empirical distribution functions. While revealing complete functions is more expensive from the point of view of privacy budget, it may also provide richer and more valuable information to the learner. We prove privacy guarantees and discuss the computational cost, both for a generic strategy fitting any security model and a special-purpose strategy based on secret sharing. We survey a number of applications and present experiments.

Differentially Private Empirical Cumulative Distribution Functions

TL;DR

The paper addresses private computation of empirical CDFs in a federated setting by proposing a DP mechanism that publishes a complete ECDF with logarithmic noise growth, along with a smoothing step to enforce monotonicity. It develops two computational pathways—a generic secure aggregation approach and a function secret sharing (FSS) approach—to evaluate ECDFs and their inverses, and demonstrates DP guarantees with explicit error bounds. The authors apply these techniques to two impactful metrics, DP ROC curves and DP Hosmer-Lemeshow calibration, and validate performance through experiments on synthetic and real-world datasets, including analyses of smoothing effects and runtime. The work provides a modular framework for private distributional statistics in collaborative settings, enabling more informative privacy-preserving analyses and broader applicability beyond simple DP summaries.

Abstract

In order to both learn and protect sensitive training data, there has been a growing interest in privacy preserving machine learning methods. Differential privacy has emerged as an important measure of privacy. We are interested in the federated setting where a group of parties each have one or more training instances and want to learn collaboratively without revealing their data. In this paper, we propose strategies to compute differentially private empirical distribution functions. While revealing complete functions is more expensive from the point of view of privacy budget, it may also provide richer and more valuable information to the learner. We prove privacy guarantees and discuss the computational cost, both for a generic strategy fitting any security model and a special-purpose strategy based on secret sharing. We survey a number of applications and present experiments.

Paper Structure

This paper contains 27 sections, 4 theorems, 22 equations, 10 figures, 1 table, 2 algorithms.

Key Result

theorem thmcountertheorem

Publishing ${{\hat{F}}_{{\phi}}}(X,\tau_i)$ for all $i\in[N]$ is $\epsilon$-DP. The expected squared error is $\mathbb{E}[({F_{{\phi}}}(x)-{{\hat{F}}_{{\phi}}}(x))^2]=2(L+1)^3/\epsilon^2$.

Figures (10)

  • Figure 1: ROC curve for logistic regression on the Heart disease dataset, and $\epsilon$-DP curves with $\epsilon=0.5$.
  • Figure 2: Effect of smoothing on DP error - fixed $\lambda=3$
  • Figure 3: Effect of smoothing on DP error - fixed $\epsilon=1$
  • Figure 4: Inverse ECDF
  • Figure 5: ROC curve estimation error
  • ...and 5 more figures

Theorems & Definitions (11)

  • definition thmcounterdefinition: differential privacy
  • definition thmcounterdefinition: U-statistic
  • definition thmcounterdefinition: Empirical cumulative distribution function
  • theorem thmcountertheorem
  • theorem thmcountertheorem
  • lemma thmcounterlemma
  • proof
  • proof
  • theorem thmcountertheorem
  • proof
  • ...and 1 more