Differentially Private Release of Israel's National Registry of Live Births
Shlomi Hod, Ran Canetti
TL;DR
The authors present the first real-world differential private release of Israel’s National Registry of Live Births for 2014, releasing six microdata-style fields for singleton births (n ≈ 165k–167k) under a total DP budget of $\varepsilon = 9.98$, formed via a co-design process with stakeholders. They introduce a universal, algorithm-agnostic scheme that obtains DP synthetic data by exploring a configurable space of data transformations, generative models, and hyperparameters, while satisfying eight acceptance criteria that quantify accuracy for targeted queries (marginals, conditional means, linear regression) and a faithfulness constraint. A key innovation is the private selection mechanism which amortizes privacy loss across configuration search, enabling end-to-end DP guarantees while providing publicly released DP-validated accuracy metrics. The work also introduces notions of faithfulness and face privacy to improve trust and avoid unique records, and demonstrates how public data, PCA-like configuration strategies, and a careful privacy-budget allocation can yield a usable, government-grade data release with DP guarantees and actionable documentation for data subjects and users. The release showcases DP viability for government medical data and provides a blueprint for future synthetic-data-based microdata releases, with reflections on future improvements and the broader implications for policy-relevant research.
Abstract
In February 2024, Israel's Ministry of Health released microdata of live births in Israel in 2014. The dataset is based on Israel's National Registry of Live Births and offers substantial value in multiple areas, such as scientific research and policy-making, while providing pure differential privacy guarantee with $\varepsilon = 9.98$ for 2014's mothers and newborns. The release was co-designed by the authors along with stakeholders from both inside and outside the Ministry of Health. This paper presents the methodology used to obtain that release, which, to the best of our knowledge, is the first of its kind in the world. The design process has been challenging and required flexibility and open-mindedness on all sides involved, along with substantial technical innovation. In particular, we introduce new concepts regarding the desiderata from dataset releases in a microdata format, as well as a way to bundle together multiple quantitative desiderata for a differentially private release using the private selection algorithm of Liu and Talwar (STOC 2019). We hope that the experiences reported here will be useful to future differentially private releases.
