Differentially Private Neural Tangent Kernels for Privacy-Preserving Data Generation

Yilin Yang; Kamil Adamczewski; Danica J. Sutherland; Xiaoxiao Li; Mijung Park

Differentially Private Neural Tangent Kernels for Privacy-Preserving Data Generation

Yilin Yang, Kamil Adamczewski, Danica J. Sutherland, Xiaoxiao Li, Mijung Park

TL;DR

This work introduces DP-NTK, a practical framework for differential privacy in data generation that uses finite-dimensional empirical Neural Tangent Kernel features within a kernel mean embedding and MMD objective. By privatizing the mean embedding with the Gaussian mechanism and training a generator to minimize the privatized MMD, DP-NTK achieves strong privacy guarantees while maintaining high utility, even without public data. Theoretical analysis shows that the private minimizer closely tracks the non-private optimum, with favorable rates, and empirical results across MNIST, FashionMNIST, CelebA, CIFAR-10, and tabular datasets demonstrate competitive or superior performance relative to state-of-the-art private generators. The approach offers a scalable, data-efficient pathway for privacy-preserving data synthesis with broad applicability across vision and tabular domains.

Abstract

Maximum mean discrepancy (MMD) is a particularly useful distance metric for differentially private data generation: when used with finite-dimensional features it allows us to summarize and privatize the data distribution once, which we can repeatedly use during generator training without further privacy loss. An important question in this framework is, then, what features are useful to distinguish between real and synthetic data distributions, and whether those enable us to generate quality synthetic data. This work considers the using the features of $\textit{neural tangent kernels (NTKs)}$, more precisely $\textit{empirical}$ NTKs (e-NTKs). We find that, perhaps surprisingly, the expressiveness of the untrained e-NTK features is comparable to that of the features taken from pre-trained perceptual features using public data. As a result, our method improves the privacy-accuracy trade-off compared to other state-of-the-art methods, without relying on any public data, as demonstrated on several tabular and image benchmark datasets.

Differentially Private Neural Tangent Kernels for Privacy-Preserving Data Generation

TL;DR

Abstract

, more precisely

NTKs (e-NTKs). We find that, perhaps surprisingly, the expressiveness of the untrained e-NTK features is comparable to that of the features taken from pre-trained perceptual features using public data. As a result, our method improves the privacy-accuracy trade-off compared to other state-of-the-art methods, without relying on any public data, as demonstrated on several tabular and image benchmark datasets.

Paper Structure (15 sections, 2 theorems, 13 equations, 4 figures, 5 tables, 1 algorithm)

This paper contains 15 sections, 2 theorems, 13 equations, 4 figures, 5 tables, 1 algorithm.

Introduction
Background
Differential Privacy (DP)
Maximum Mean Discrepancy (MMD)
Neural Tangent Kernel (NTK)
The DP-NTK model
Theoretical analysis
Experiments
Generating MNIST and FashionMNIST images
Privacy-Width Trade-off
Varying privacy levels
Generating CelebA and CIFAR10 images
Generating tabular data
Summary and Discussion
Hyperparameters Used in Experiments

Key Result

Proposition 3.1

The global sensitivity of the mean embedding eq:me_data is $\Delta_{\mathbf{\bm{\mu}}_P} = 2 / m$.

Figures (4)

Figure 1: Generated samples of MNIST and FashionMNIST from DP-NTK with different widths $w$; all samples use the same DP noise level ($\epsilon=10$, $\delta = 10^{-5}$).
Figure 2: DP-NTK under different DP levels (left) and comparison results with different models (right) for MNIST and FashionMNIST
Figure 3: Synthetic $32 \times 32$ CelebA samples generated at different levels of privacy. Samples for DP-MERF and DP-Sinkhorn are taken from ? (? ). Our method yields samples of higher visual quality than the comparison methods. The FID for the proposed method is 75. FID for DP-Sinkhorn is 189. FID for DP-MERF is 274.
Figure 4: The generated and real images for the CIFAR-10 dataset. The FID scores for the proposed method are 104 ($\epsilon=\infty$) and 107 ($\epsilon=10$), respectively. For DP-MERF, they are 127 ($\epsilon=\infty$) and 141 ($\epsilon=10$).

Theorems & Definitions (3)

Proposition 3.1
proof
Proposition 4.1

Differentially Private Neural Tangent Kernels for Privacy-Preserving Data Generation

TL;DR

Abstract

Differentially Private Neural Tangent Kernels for Privacy-Preserving Data Generation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (3)