Adversarial Masked Autoencoder Purifier with Defense Transferability
Yuan-Chih Chen, Chun-Shien Lu
TL;DR
The paper tackles adversarial robustness by introducing MAEP, a MAE-based adversarial purifier that operates at test time without needing extra training data. MAEP jointly optimizes a purification loss and a masked-language-modeling-inspired reconstruction objective, yielding defense transferability across datasets and strong attack generalization. Empirical results demonstrate MAEP's competitive robustness on CIFAR-10, superior clean accuracy over diffusion-based purifiers, and notable transferability from CIFAR-10 to ImageNet, all with substantially faster inference and training times. These findings suggest a practical, data-efficient pathway to robust purification using transformer-based architectures. MAEP also highlights the potential for LoRA-based lightweight finetuning to enhance cross-domain defense without heavy computational costs.
Abstract
The study of adversarial defense still struggles to combat with advanced adversarial attacks. In contrast to most prior studies that rely on the diffusion model for test-time defense to remarkably increase the inference time, we propose Masked AutoEncoder Purifier (MAEP), which integrates Masked AutoEncoder (MAE) into an adversarial purifier framework for test-time purification. While MAEP achieves promising adversarial robustness, it particularly features model defense transferability and attack generalization without relying on using additional data that is different from the training dataset. To our knowledge, MAEP is the first study of adversarial purifier based on MAE. Extensive experimental results demonstrate that our method can not only maintain clear accuracy with only a slight drop but also exhibit a close gap between the clean and robust accuracy. Notably, MAEP trained on CIFAR10 achieves state-of-the-art performance even when tested directly on ImageNet, outperforming existing diffusion-based models trained specifically on ImageNet.
