Simulation-based Bayesian Inference from Privacy Protected Data
Yifei Xiong, Nianqiao Phyllis Ju, Sanguo Zhang
TL;DR
This work addresses the challenge of performing valid statistical inference when only differentially private outputs are available. It proposes a trio of likelihood-free approaches—SMC-ABC, sequential private posterior estimation (SPPE), and sequential private likelihood estimation (SPLE)—supplemented by neural density estimators (notably normalizing flows) and variance-reducing randomized quasi-Monte Carlo (RQMC) to learn from privatized data. The methods are demonstrated on a DP-privatized SIR disease-spread model and Bayesian linear regression, showing that SPPE/SPLE achieve comparable accuracy to SMC-ABC with substantially fewer simulations, while correcting for biases introduced by privacy mechanisms. Overall, the framework enables reliable inference and uncertainty quantification from privacy-protected data, offering a path toward privacy-preserving data sharing and analysis with complex, intractable likelihoods.
Abstract
Many modern statistical analysis and machine learning applications require training models on sensitive user data. Under a formal definition of privacy protection, differentially private algorithms inject calibrated noise into the confidential data or during the data analysis process to produce privacy-protected datasets or queries. However, restricting access to only privatized data during statistical analysis makes it computationally challenging to make valid statistical inferences. In this work, we propose simulation-based inference methods from privacy-protected datasets. In addition to sequential Monte Carlo approximate Bayesian computation, we adopt neural conditional density estimators as a flexible family of distributions to approximate the posterior distribution of model parameters given the observed private query results. We illustrate our methods on discrete time-series data under an infectious disease model and with ordinary linear regression models. Illustrating the privacy-utility trade-off, our experiments and analysis demonstrate the necessity and feasibility of designing valid statistical inference procedures to correct for biases introduced by the privacy-protection mechanisms.
