Private Gradient Estimation is Useful for Generative Modeling

Bochao Liu; Pengju Wang; Weijia Guo; Yong Li; Liansheng Zhuang; Weiping Wang; Shiming Ge

Private Gradient Estimation is Useful for Generative Modeling

Bochao Liu, Pengju Wang, Weijia Guo, Yong Li, Liansheng Zhuang, Weiping Wang, Shiming Ge

TL;DR

The paper tackles privacy leakage in generative modeling by introducing Private Gradient Estimation (PGE), which privately learns the score of the private data via an energy-based model trained with sliced score matching and randomized response. It then generates data through Hamiltonian dynamics-based sampling, enabling high-resolution outputs up to $256\times256$ while satisfying $(\varepsilon,0)$-DP. Key contributions include the private gradient estimation framework with a residual enhancement module, a DP-enabled sampling procedure, and formal privacy/convergence analyses, plus extensive experiments showing improved utility and realism over prior DP methods. The work has practical impact by enabling privacy-preserving generation at high resolutions with competitive data utility, suitable for downstream tasks without compromising sensitive information.

Abstract

While generative models have proved successful in many domains, they may pose a privacy leakage risk in practical deployment. To address this issue, differentially private generative model learning has emerged as a solution to train private generative models for different downstream tasks. However, existing private generative modeling approaches face significant challenges in generating high-dimensional data due to the inherent complexity involved in modeling such data. In this work, we present a new private generative modeling approach where samples are generated via Hamiltonian dynamics with gradients of the private dataset estimated by a well-trained network. In the approach, we achieve differential privacy by perturbing the projection vectors in the estimation of gradients with sliced score matching. In addition, we enhance the reconstruction ability of the model by incorporating a residual enhancement module during the score matching. For sampling, we perform Hamiltonian dynamics with gradients estimated by the well-trained network, allowing the sampled data close to the private dataset's manifold step by step. In this way, our model is able to generate data with a resolution of 256x256. Extensive experiments and analysis clearly demonstrate the effectiveness and rationality of the proposed approach.

Private Gradient Estimation is Useful for Generative Modeling

TL;DR

while satisfying

-DP. Key contributions include the private gradient estimation framework with a residual enhancement module, a DP-enabled sampling procedure, and formal privacy/convergence analyses, plus extensive experiments showing improved utility and realism over prior DP methods. The work has practical impact by enabling privacy-preserving generation at high resolutions with competitive data utility, suitable for downstream tasks without compromising sensitive information.

Abstract

Paper Structure (13 sections, 6 theorems, 26 equations, 10 figures, 4 tables, 2 algorithms)

This paper contains 13 sections, 6 theorems, 26 equations, 10 figures, 4 tables, 2 algorithms.

Introduction
Related Works
Preliminaries
Approach
Private Gradient Estimation
Sampling with Hamiltonian Dynamics
Convergence Analysis
Experiments
Experimental Setup
Experimental Results
Ablation Studies
Limitations
Conclusion

Key Result

Theorem 1

Our PGE satisfies $\varepsilon$-DP.

Figures (10)

Figure 1: Synthetic data generated by the generative model trained with private data directly may contain sensitive information. To address that, we achieve differentially private learning by private gradient estimation. Synthetic data generated by this generative model can be used for different downstream tasks with privacy protection.
Figure 2: Overview of our PGE. We first sample some images $\boldsymbol{x}$ from the private data distribution $p(\boldsymbol{x})$. These images are then fed into the network $q_{\theta}$ for prediction. Concurrently, we encode these images using a pre-trained VQGAN and incorporate the masked version into the features extracted by the middle layer of $q_{\theta}$. This enhances the image reconstruction capability of $q_{\theta}$. Following the prediction by $q_{\theta}$, both $q_{\theta}(\boldsymbol{x})$ and $p(\boldsymbol{x})$ are projected for dimensionality reduction. During this process, we perturb their projection vectors by RR to achieve DP. Specifically, $\nabla \log p(\boldsymbol{x})$ is projected onto the $\boldsymbol{v}_1$ direction, while RR projects $q_{\theta}(\boldsymbol{x})$ onto the $\boldsymbol{v}_1$ direction with a probability of $e^{\varepsilon}/(e^{\varepsilon}+k-1)$, and onto the other direction with a probability of $1/(e^{\varepsilon}+k-1)$. Here, $k$ refers to the number of projection vectors. Finally, the network $q_{\theta}$ is updated by computing the loss between the predicted distribution $q_{\theta}(\boldsymbol{x})$ and the original distribution $p(\boldsymbol{x})$.
Figure 3: Overview of the sampling process. Given a well-trained network, it can predict the gradients required by Hamiltonian dynamics. After several rounds of sampling, the samples gradually converge from a noisy distribution to the distribution of private data.
Figure 4: Visualization results of DP-GAN, GS-WGAN, DP-MERF, P3GM, DataLens, DPGEN, DP-LDM and our PGE on CelebA at 32$\times$32 and 64$\times$64 resolutions.
Figure 5: Visualization results of CelebA and LSUN at 256$\times$256 resolution under $\varepsilon=20$.
...and 5 more figures

Theorems & Definitions (6)

Theorem 1
Lemma 1
Lemma 2
Theorem 1
Lemma 3
Lemma 4

Private Gradient Estimation is Useful for Generative Modeling

TL;DR

Abstract

Private Gradient Estimation is Useful for Generative Modeling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (6)