Private and Collaborative Kaplan-Meier Estimators
Shadi Rahimian, Raouf Kerkouche, Ina Kurth, Mario Fritz
TL;DR
This work tackles the problem of privacy-preserving joint Kaplan-Meier estimation across multiple data-holding sites. It introduces two differential-privacy-based methods, DP-Surv and DP-Prob, which directly perturb survival-related representations, along with DP-Matrix$^+$ as a baseline, and a surrogate-dataset generator to enable flexible representation conversion. The authors present a taxonomy of seven collaboration paths for building a global private KM estimator, and demonstrate via experiments on real medical datasets that a joint estimator with a bound such as $\varepsilon=1$ can closely match the centralized non-private KM curve while protecting individual data. The surrogate dataset approach and multi-representation DP strategies facilitate scalable, privacy-preserving collaboration for survival analysis, with strong practical impact for cross-institutional studies. Future work includes extending the DP framework to censored data scenarios and refining sensitivity analyses to improve utility under censoring.
Abstract
Kaplan-Meier estimators are essential tools in survival analysis, capturing the survival behavior of a cohort. Their accuracy improves with large, diverse datasets, encouraging data holders to collaborate for more precise estimations. However, these datasets often contain sensitive individual information, necessitating stringent data protection measures that preclude naive data sharing. In this work, we introduce two novel differentially private methods that offer flexibility in applying differential privacy to various functions of the data. Additionally, we propose a synthetic dataset generation technique that enables easy and rapid conversion between different data representations. Utilizing these methods, we propose various paths that allow a joint estimation of the Kaplan-Meier curves with strict privacy guarantees. Our contribution includes a taxonomy of methods for this task and an extensive experimental exploration and evaluation based on this structure. We demonstrate that our approach can construct a joint, global Kaplan-Meier estimator that adheres to strict privacy standards ($\varepsilon = 1$) while exhibiting no statistically significant deviation from the nonprivate centralized estimator.
