Pre-trained Multiple Latent Variable Generative Models are good defenders against Adversarial Attacks
Dario Serez, Marco Cristani, Alessio Del Bue, Vittorio Murino, Pietro Morerio
TL;DR
Adversarial perturbations threaten classifier reliability, and existing defenses often require costly training. The authors propose a training-free adversarial purification framework that leverages pre-trained Multiple Latent Variable Generative Models (MLVGMs) to disentangle global class-relevant information from local adversarial detail across multiple latent codes, enabling robust purification without task-specific retraining. The method encodes an input to latent codes $z^{ ext{e}}_i$, samples $z^{ ext{s}}_i$ from priors, and decodes interpolated latents $z_i = (1 - \alpha_i) z^{ ext{e}}_i + \alpha_i z^{ ext{s}}_i$ with $0 \le \alpha_i \le 1$ to produce purified images; hyperparameters $\alpha_i$ can be found via Bayesian Optimization or fixed monotonic schedules. Experiments on CelebA Gender, CelebA identities, and Stanford Cars with StyleGAN2 and NVAE show the approach is competitive with or close to specialized purification methods (TRADES, A-VAE, ND-VAE) despite using smaller models and no task-specific training, highlighting the potential of MLVGMs as foundation models for defense. The work suggests a scalable path toward robust vision systems and motivates releasing stronger, billions-of-sample-trained MLVGMs for broader downstream use.
Abstract
Attackers can deliberately perturb classifiers' input with subtle noise, altering final predictions. Among proposed countermeasures, adversarial purification employs generative networks to preprocess input images, filtering out adversarial noise. In this study, we propose specific generators, defined Multiple Latent Variable Generative Models (MLVGMs), for adversarial purification. These models possess multiple latent variables that naturally disentangle coarse from fine features. Taking advantage of these properties, we autoencode images to maintain class-relevant information, while discarding and re-sampling any detail, including adversarial noise. The procedure is completely training-free, exploring the generalization abilities of pre-trained MLVGMs on the adversarial purification downstream task. Despite the lack of large models, trained on billions of samples, we show that smaller MLVGMs are already competitive with traditional methods, and can be used as foundation models. Official code released at https://github.com/SerezD/gen_adversarial.
