Table of Contents
Fetching ...

Adversarial Masked Autoencoder Purifier with Defense Transferability

Yuan-Chih Chen, Chun-Shien Lu

TL;DR

The paper tackles adversarial robustness by introducing MAEP, a MAE-based adversarial purifier that operates at test time without needing extra training data. MAEP jointly optimizes a purification loss and a masked-language-modeling-inspired reconstruction objective, yielding defense transferability across datasets and strong attack generalization. Empirical results demonstrate MAEP's competitive robustness on CIFAR-10, superior clean accuracy over diffusion-based purifiers, and notable transferability from CIFAR-10 to ImageNet, all with substantially faster inference and training times. These findings suggest a practical, data-efficient pathway to robust purification using transformer-based architectures. MAEP also highlights the potential for LoRA-based lightweight finetuning to enhance cross-domain defense without heavy computational costs.

Abstract

The study of adversarial defense still struggles to combat with advanced adversarial attacks. In contrast to most prior studies that rely on the diffusion model for test-time defense to remarkably increase the inference time, we propose Masked AutoEncoder Purifier (MAEP), which integrates Masked AutoEncoder (MAE) into an adversarial purifier framework for test-time purification. While MAEP achieves promising adversarial robustness, it particularly features model defense transferability and attack generalization without relying on using additional data that is different from the training dataset. To our knowledge, MAEP is the first study of adversarial purifier based on MAE. Extensive experimental results demonstrate that our method can not only maintain clear accuracy with only a slight drop but also exhibit a close gap between the clean and robust accuracy. Notably, MAEP trained on CIFAR10 achieves state-of-the-art performance even when tested directly on ImageNet, outperforming existing diffusion-based models trained specifically on ImageNet.

Adversarial Masked Autoencoder Purifier with Defense Transferability

TL;DR

The paper tackles adversarial robustness by introducing MAEP, a MAE-based adversarial purifier that operates at test time without needing extra training data. MAEP jointly optimizes a purification loss and a masked-language-modeling-inspired reconstruction objective, yielding defense transferability across datasets and strong attack generalization. Empirical results demonstrate MAEP's competitive robustness on CIFAR-10, superior clean accuracy over diffusion-based purifiers, and notable transferability from CIFAR-10 to ImageNet, all with substantially faster inference and training times. These findings suggest a practical, data-efficient pathway to robust purification using transformer-based architectures. MAEP also highlights the potential for LoRA-based lightweight finetuning to enhance cross-domain defense without heavy computational costs.

Abstract

The study of adversarial defense still struggles to combat with advanced adversarial attacks. In contrast to most prior studies that rely on the diffusion model for test-time defense to remarkably increase the inference time, we propose Masked AutoEncoder Purifier (MAEP), which integrates Masked AutoEncoder (MAE) into an adversarial purifier framework for test-time purification. While MAEP achieves promising adversarial robustness, it particularly features model defense transferability and attack generalization without relying on using additional data that is different from the training dataset. To our knowledge, MAEP is the first study of adversarial purifier based on MAE. Extensive experimental results demonstrate that our method can not only maintain clear accuracy with only a slight drop but also exhibit a close gap between the clean and robust accuracy. Notably, MAEP trained on CIFAR10 achieves state-of-the-art performance even when tested directly on ImageNet, outperforming existing diffusion-based models trained specifically on ImageNet.

Paper Structure

This paper contains 27 sections, 16 equations, 3 figures, 19 tables.

Figures (3)

  • Figure 1: Workflow of our method. (a) Pre-training stage: Learn the patch representation by masking patch prediction and reconstruction by the purification loss. (b) Finetuning stage: Alleviate the information loss caused by masked patches in the pre-training stage (a.k.a the train-test discrepancy).
  • Figure 2: Purification loss learns the direction from $x_a$ to $x$ and the direction of $-\delta_a$ (one-step adversarial perturbation along negative gradient) is roughly the same as $\mathcal{P}(x_a)$ due to \ref{['Eq:equivalent']}.
  • Figure 3: Comparison of clean images (left), adversarial images (middle), and purified images (right) by MAEP under AutoAttack. The MAEP is trained on CIFAR10 and directly tested on ImageNet without any finetuning.