Differentially Private Release of Israel's National Registry of Live Births

Shlomi Hod; Ran Canetti

Differentially Private Release of Israel's National Registry of Live Births

Shlomi Hod, Ran Canetti

TL;DR

The authors present the first real-world differential private release of Israel’s National Registry of Live Births for 2014, releasing six microdata-style fields for singleton births (n ≈ 165k–167k) under a total DP budget of $\varepsilon = 9.98$, formed via a co-design process with stakeholders. They introduce a universal, algorithm-agnostic scheme that obtains DP synthetic data by exploring a configurable space of data transformations, generative models, and hyperparameters, while satisfying eight acceptance criteria that quantify accuracy for targeted queries (marginals, conditional means, linear regression) and a faithfulness constraint. A key innovation is the private selection mechanism which amortizes privacy loss across configuration search, enabling end-to-end DP guarantees while providing publicly released DP-validated accuracy metrics. The work also introduces notions of faithfulness and face privacy to improve trust and avoid unique records, and demonstrates how public data, PCA-like configuration strategies, and a careful privacy-budget allocation can yield a usable, government-grade data release with DP guarantees and actionable documentation for data subjects and users. The release showcases DP viability for government medical data and provides a blueprint for future synthetic-data-based microdata releases, with reflections on future improvements and the broader implications for policy-relevant research.

Abstract

In February 2024, Israel's Ministry of Health released microdata of live births in Israel in 2014. The dataset is based on Israel's National Registry of Live Births and offers substantial value in multiple areas, such as scientific research and policy-making, while providing pure differential privacy guarantee with $\varepsilon = 9.98$ for 2014's mothers and newborns. The release was co-designed by the authors along with stakeholders from both inside and outside the Ministry of Health. This paper presents the methodology used to obtain that release, which, to the best of our knowledge, is the first of its kind in the world. The design process has been challenging and required flexibility and open-mindedness on all sides involved, along with substantial technical innovation. In particular, we introduce new concepts regarding the desiderata from dataset releases in a microdata format, as well as a way to bundle together multiple quantitative desiderata for a differentially private release using the private selection algorithm of Liu and Talwar (STOC 2019). We hope that the experiences reported here will be useful to future differentially private releases.

Differentially Private Release of Israel's National Registry of Live Births

TL;DR

, formed via a co-design process with stakeholders. They introduce a universal, algorithm-agnostic scheme that obtains DP synthetic data by exploring a configurable space of data transformations, generative models, and hyperparameters, while satisfying eight acceptance criteria that quantify accuracy for targeted queries (marginals, conditional means, linear regression) and a faithfulness constraint. A key innovation is the private selection mechanism which amortizes privacy loss across configuration search, enabling end-to-end DP guarantees while providing publicly released DP-validated accuracy metrics. The work also introduces notions of faithfulness and face privacy to improve trust and avoid unique records, and demonstrates how public data, PCA-like configuration strategies, and a careful privacy-budget allocation can yield a usable, government-grade data release with DP guarantees and actionable documentation for data subjects and users. The release showcases DP viability for government medical data and provides a blueprint for future synthetic-data-based microdata releases, with reflections on future improvements and the broader implications for policy-relevant research.

Abstract

for 2014's mothers and newborns. The release was co-designed by the authors along with stakeholders from both inside and outside the Ministry of Health. This paper presents the methodology used to obtain that release, which, to the best of our knowledge, is the first of its kind in the world. The design process has been challenging and required flexibility and open-mindedness on all sides involved, along with substantial technical innovation. In particular, we introduce new concepts regarding the desiderata from dataset releases in a microdata format, as well as a way to bundle together multiple quantitative desiderata for a differentially private release using the private selection algorithm of Liu and Talwar (STOC 2019). We hope that the experiences reported here will be useful to future differentially private releases.

Paper Structure (80 sections, 15 theorems, 38 equations, 1 figure, 2 tables, 5 algorithms)

This paper contains 80 sections, 15 theorems, 38 equations, 1 figure, 2 tables, 5 algorithms.

Introduction
Requirements and Solution Concepts
Privacy
Format
Quality
Configurations
Faithfulness
Face Privacy
Transparency
Universal Scheme for Differentially Private Synthetic Data
Leveraging Public Data
The Release
Summary of our Contributions
Organization of this Paper
The Components of the Universal Scheme
...and 65 more sections

Key Result

Proposition 1

Let $\{\mathpzc{M}_i\}_{i=1}^k$ be a collection of $\{(\varepsilon_i)\}_{i=1}^k$-differently private mechanisms, respectively. Then the their combination mechanism, $\mathpzc{M}$, defined to be $\mathpzc{M}(x) = (\mathpzc{M}_1(X), \ldots, \mathpzc{M}_k(x))$ is $(\sum_{i=1}^k \varepsilon_i )$-differe

Figures (1)

Figure 1: Schematic overview of the universal scheme for differentially private microdata data release. A red arrow ($\rightarrow$) represents a computation done with differential privacy.

Theorems & Definitions (35)

Definition 1: Bounded Neighboring Datasets
Definition 2: Differential Privacy Dwork2006CalibratingNT
Proposition 1: Differential Privacy Basic Composition DworkKMMN06
Proposition 2: Differential Privacy Post-Processing Dwork2006CalibratingNT
Definition 3: $(\alpha, \beta)$-faithfulness
Definition 4: Maximal-$\beta$-faithfulness
Definition 5: Acceptance criterion
Definition 6: Clipping Function
Definition 7: Global Sensitivity
Theorem 1: Laplace Mechanism Dwork2006CalibratingNT
...and 25 more

Differentially Private Release of Israel's National Registry of Live Births

TL;DR

Abstract

Differentially Private Release of Israel's National Registry of Live Births

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (35)