Improving Epidemic Analyses with Privacy-Preserving Integration of Sensitive Data

Zihan Guan; Zhiyuan Zhao; Fengwei Tian; Dung Nguyen; Payel Bhattacharjee; Ravi Tandon; B. Aditya Prakash; Anil Vullikanti

Improving Epidemic Analyses with Privacy-Preserving Integration of Sensitive Data

Zihan Guan, Zhiyuan Zhao, Fengwei Tian, Dung Nguyen, Payel Bhattacharjee, Ravi Tandon, B. Aditya Prakash, Anil Vullikanti

Abstract

Epidemic analyses increasingly rely on heterogeneous datasets, many of which are sensitive and require strong privacy protection. Although differential privacy (DP) has become a standard in machine learning and data sharing, its adoption in epidemiological modeling remains limited. In this work, we introduce DPEpiNN, a unified framework that integrates deep neural networks with a mechanistic SEIRM-based metapopulation model under formal DP guarantees. DPEpiNN supports multiple epidemic tasks (including multi-step forecasting, nowcasting, effective reproduction number $(R_t)$ estimation, and intervention analysis) within a single differentiable pipeline. The framework jointly learns epidemic parameters from heterogeneous public and sensitive datasets, while ensuring privacy via input perturbation mechanisms. We evaluate DPEpiNN using COVID-19 data from three regions. Results show that incorporating sensitive datasets substantially improves predictive performance even under strong privacy constraints. Compared with a deep learning baseline, DPEpiNN achieves higher accuracy in forecasting and nowcasting while producing reliable estimates of $R_t$. Furthermore, the learned epidemic transmission models remain inherently private due to the post-processing property of differential privacy, enabling downstream policy analyses such as simulation of social distancing interventions. Our work demonstrates that interpretability (through mechanistic modeling), predictive accuracy (through neural integration), and rigorous privacy guarantees can be jointly achieved in modern epidemic modeling.

Improving Epidemic Analyses with Privacy-Preserving Integration of Sensitive Data

Abstract

estimation, and intervention analysis) within a single differentiable pipeline. The framework jointly learns epidemic parameters from heterogeneous public and sensitive datasets, while ensuring privacy via input perturbation mechanisms. We evaluate DPEpiNN using COVID-19 data from three regions. Results show that incorporating sensitive datasets substantially improves predictive performance even under strong privacy constraints. Compared with a deep learning baseline, DPEpiNN achieves higher accuracy in forecasting and nowcasting while producing reliable estimates of

. Furthermore, the learned epidemic transmission models remain inherently private due to the post-processing property of differential privacy, enabling downstream policy analyses such as simulation of social distancing interventions. Our work demonstrates that interpretability (through mechanistic modeling), predictive accuracy (through neural integration), and rigorous privacy guarantees can be jointly achieved in modern epidemic modeling.

Improving Epidemic Analyses with Privacy-Preserving Integration of Sensitive Data

Abstract

Improving Epidemic Analyses with Privacy-Preserving Integration of Sensitive Data

Abstract

Paper Structure

Table of Contents

Figures (7)

Theorems & Definitions (1)