Enhancing Membership Inference Attacks on Diffusion Models from a Frequency-Domain Perspective
Puwei Lian, Yujun Cai, Songze Li, Bingkun Bao
TL;DR
This work reveals that existing membership inference attacks on diffusion models overlook a frequency-domain deficiency: high-frequency components are processed with greater variability, attenuating the attackers' discriminative power. The authors formalize a general error-based MIA paradigm for diffusion models and demonstrate how transforming intermediate outputs into the frequency domain exposes weak signals in high-frequency content. They propose a plug-and-play high-frequency filter that suppresses high-frequency information via a Fourier-domain mask, improving attack performance across datasets and models with negligible time cost, and provide theoretical support showing increased membership advantage under filtering. Extensive experiments across DDIM, Stable Diffusion, and various datasets show consistent gains in ASR, AUC, and TPR@1%FPR, while analyses and ablations validate robustness to hyperparameters and defenses. The work offers a practical approach to evaluating and potentially mitigating privacy risks in diffusion-model deployments, and suggests broader applicability to frequency-aware attacks beyond the studied methods.
Abstract
Diffusion models have achieved tremendous success in image generation, but they also raise significant concerns regarding privacy and copyright issues. Membership Inference Attacks (MIAs) are designed to ascertain whether specific data were utilized during a model's training phase. As current MIAs for diffusion models typically exploit the model's image prediction ability, we formalize them into a unified general paradigm which computes the membership score for membership identification. Under this paradigm, we empirically find that existing attacks overlook the inherent deficiency in how diffusion models process high-frequency information. Consequently, this deficiency leads to member data with more high-frequency content being misclassified as hold-out data, and hold-out data with less high-frequency content tend to be misclassified as member data. Moreover, we theoretically demonstrate that this deficiency reduces the membership advantage of attacks, thereby interfering with the effective discrimination of member data and hold-out data. Based on this insight, we propose a plug-and-play high-frequency filter module to mitigate the adverse effects of the deficiency, which can be seamlessly integrated into any attacks within this general paradigm without additional time costs. Extensive experiments corroborate that this module significantly improves the performance of baseline attacks across different datasets and models.
