Private Gradient Estimation is Useful for Generative Modeling
Bochao Liu, Pengju Wang, Weijia Guo, Yong Li, Liansheng Zhuang, Weiping Wang, Shiming Ge
TL;DR
The paper tackles privacy leakage in generative modeling by introducing Private Gradient Estimation (PGE), which privately learns the score of the private data via an energy-based model trained with sliced score matching and randomized response. It then generates data through Hamiltonian dynamics-based sampling, enabling high-resolution outputs up to $256\times256$ while satisfying $(\varepsilon,0)$-DP. Key contributions include the private gradient estimation framework with a residual enhancement module, a DP-enabled sampling procedure, and formal privacy/convergence analyses, plus extensive experiments showing improved utility and realism over prior DP methods. The work has practical impact by enabling privacy-preserving generation at high resolutions with competitive data utility, suitable for downstream tasks without compromising sensitive information.
Abstract
While generative models have proved successful in many domains, they may pose a privacy leakage risk in practical deployment. To address this issue, differentially private generative model learning has emerged as a solution to train private generative models for different downstream tasks. However, existing private generative modeling approaches face significant challenges in generating high-dimensional data due to the inherent complexity involved in modeling such data. In this work, we present a new private generative modeling approach where samples are generated via Hamiltonian dynamics with gradients of the private dataset estimated by a well-trained network. In the approach, we achieve differential privacy by perturbing the projection vectors in the estimation of gradients with sliced score matching. In addition, we enhance the reconstruction ability of the model by incorporating a residual enhancement module during the score matching. For sampling, we perform Hamiltonian dynamics with gradients estimated by the well-trained network, allowing the sampled data close to the private dataset's manifold step by step. In this way, our model is able to generate data with a resolution of 256x256. Extensive experiments and analysis clearly demonstrate the effectiveness and rationality of the proposed approach.
