Your Diffusion Model is Secretly a Certifiably Robust Classifier
Huanran Chen, Yinpeng Dong, Shitong Shao, Zhongkai Hao, Xiao Yang, Hang Su, Jun Zhu
TL;DR
This paper studies the provable robustness of diffusion-based classifiers and shows they enjoy an $O(1)$ Lipschitz constant. It introduces Noised Diffusion Classifiers, including Exact Posterior (EPNDC) and Approximated Posterior (APNDC) variants, to classify Gaussian-corrupted data via ELBO-based log-likelihoods and Bayes' theorem, further enhanced by randomized smoothing to tighten certified radii. APNDC, acting as an ensemble of EPNDC with minimal overhead, achieves state-of-the-art or competitive certified robustness on CIFAR-10 and ImageNet64x64 using a single pre-trained diffusion model without extra data, while two variance-reduction strategies dramatically cut time complexity. The work also derives a rigorous Lipschitz bound for diffusion classifiers and introduces efficient class-selection strategies (Sift-and-Refine) to scale to large class counts, highlighting both theoretical and practical advances in robust diffusion-based classification.
Abstract
Generative learning, recognized for its effective modeling of data distributions, offers inherent advantages in handling out-of-distribution instances, especially for enhancing robustness to adversarial attacks. Among these, diffusion classifiers, utilizing powerful diffusion models, have demonstrated superior empirical robustness. However, a comprehensive theoretical understanding of their robustness is still lacking, raising concerns about their vulnerability to stronger future attacks. In this study, we prove that diffusion classifiers possess $O(1)$ Lipschitzness, and establish their certified robustness, demonstrating their inherent resilience. To achieve non-constant Lipschitzness, thereby obtaining much tighter certified robustness, we generalize diffusion classifiers to classify Gaussian-corrupted data. This involves deriving the evidence lower bounds (ELBOs) for these distributions, approximating the likelihood using the ELBO, and calculating classification probabilities via Bayes' theorem. Experimental results show the superior certified robustness of these Noised Diffusion Classifiers (NDCs). Notably, we achieve over 80% and 70% certified robustness on CIFAR-10 under adversarial perturbations with \(\ell_2\) norms less than 0.25 and 0.5, respectively, using a single off-the-shelf diffusion model without any additional data.
