VP-NTK: Exploring the Benefits of Visual Prompting in Differentially Private Data Synthesis
Chia-Yi Hsu, Jia-You Chen, Yu-Lin Tsai, Chih-Hsun Lin, Pin-Yu Chen, Chia-Mu Yu, Chun-Ying Huang
TL;DR
VP-NTK investigates leveraging visual prompting to improve differential privacy data synthesis by reusing a pre-trained conditional generator and a fixed feature extractor within the DP-NTK framework. The method introduces per-class trainable visual prompts and a label-mapped training setup that preserves DP while aligning synthesized data with private data features, combining MMD and cosine similarity losses. Empirical results show significant downstream accuracy gains on high-resolution CelebA tasks under DP budgets, outperforming several state-of-the-art DP generative models. Ablation studies clarify the influence of hyperparameters such as noise level, learning rate, and loss components, underscoring VP's potential to enhance high-resolution DP data utility.
Abstract
Differentially private (DP) synthetic data has become the de facto standard for releasing sensitive data. However, many DP generative models suffer from the low utility of synthetic data, especially for high-resolution images. On the other hand, one of the emerging techniques in parameter efficient fine-tuning (PEFT) is visual prompting (VP), which allows well-trained existing models to be reused for the purpose of adapting to subsequent downstream tasks. In this work, we explore such a phenomenon in constructing captivating generative models with DP constraints. We show that VP in conjunction with DP-NTK, a DP generator that exploits the power of the neural tangent kernel (NTK) in training DP generative models, achieves a significant performance boost, particularly for high-resolution image datasets, with accuracy improving from 0.644$\pm$0.044 to 0.769. Lastly, we perform ablation studies on the effect of different parameters that influence the overall performance of VP-NTK. Our work demonstrates a promising step forward in improving the utility of DP synthetic data, particularly for high-resolution images.
