Is Diffusion Model Safe? Severe Data Leakage via Gradient-Guided Diffusion Model
Jiayang Meng, Tao Huang, Hong Chen, Cuiping Li
TL;DR
"Is Diffusion Model Safe? Severe Data Leakage via Gradient-Guided Diffusion Model" demonstrates that leaked gradients in distributed/federated settings can be exploited to reconstruct high-resolution training images by fine-tuning a pre-trained diffusion model under gradient guidance. The authors formalize a gradient-guided fine-tuning objective that aligns generated gradients with leaked ones, enabling reconstruction up to $512\times512$—far beyond prior low-resolution attacks like DLG. Across CIFAR-10, CelebA-HQ, LSUN, and ImageNet, the approach outperforms SOTA baselines in pixel-level fidelity and time efficiency, and shows partial resilience to differential privacy defenses. The work highlights substantial privacy risks from gradient exchanges and points to future directions in strengthening defenses and scaling to higher resolutions, including exploring ViT-based diffusion models."
Abstract
Gradient leakage has been identified as a potential source of privacy breaches in modern image processing systems, where the adversary can completely reconstruct the training images from leaked gradients. However, existing methods are restricted to reconstructing low-resolution images where data leakage risks of image processing systems are not sufficiently explored. In this paper, by exploiting diffusion models, we propose an innovative gradient-guided fine-tuning method and introduce a new reconstruction attack that is capable of stealing private, high-resolution images from image processing systems through leaked gradients where severe data leakage encounters. Our attack method is easy to implement and requires little prior knowledge. The experimental results indicate that current reconstruction attacks can steal images only up to a resolution of $128 \times 128$ pixels, while our attack method can successfully recover and steal images with resolutions up to $512 \times 512$ pixels. Our attack method significantly outperforms the SOTA attack baselines in terms of both pixel-wise accuracy and time efficiency of image reconstruction. Furthermore, our attack can render differential privacy ineffective to some extent.
