Table of Contents
Fetching ...

Zero-Sacrifice Persistent-Robustness Adversarial Defense for Pre-Trained Encoders

Zhuxin Lei, Ziyuan Yang, Yi Zhang

TL;DR

ZePAD addresses the vulnerability of public SSL encoders to downstream-agnostic adversarial examples by deploying a dual-branch defense that preserves or even improves benign performance while boosting adversarial robustness with a single adversarial tuning. A diverse MPAE-Branch and a benign BMP-Branch feed into a Robust Federal Decision Mechanism that fuses branch confidences, enabling persistent robustness across downstream tasks. Across 11 SSL methods and 6 datasets, ZePAD yields substantial gains in both Benign Accuracy and Robust Accuracy, and even enables DAE detection from confidence signals without extra training. The approach extends to multimodal encoders (e.g., CLIP), indicating broad applicability for zero-sacrifice defenses in practical, real-world settings.

Abstract

The widespread use of publicly available pre-trained encoders from self-supervised learning (SSL) has exposed a critical vulnerability: their susceptibility to downstream-agnostic adversarial examples (DAEs), which are crafted without knowledge of the downstream tasks but capable of misleading downstream models. While several defense methods have been explored recently, they rely primarily on task-specific adversarial fine-tuning, which inevitably limits generalizability and causes catastrophic forgetting and deteriorates benign performance. Different with previous works, we propose a more rigorous defense goal that requires only a single tuning for diverse downstream tasks to defend against DAEs and preserve benign performance. To achieve this defense goal, we introduce Zero-Sacrifice Persistent-Robustness Adversarial Defense (ZePAD), which is inspired by the inherent sensitivity of neural networks to data characteristics. Specifically, ZePAD is a dual-branch structure, which consists of a Multi-Pattern Adversarial Enhancement Branch (MPAE-Branch) that uses two adversarially fine-tuned encoders to strengthen adversarial resistance. The Benign Memory Preservation Branch (BMP-Branch) is trained on local data to ensure adversarial robustness does not compromise benign performance. Surprisingly, we find that ZePAD can directly detect DAEs by evaluating branch confidence, without introducing any adversarial exsample identification task during training. Notably, by enriching feature diversity, our method enables a single adversarial fine-tuning to defend against DAEs across downstream tasks, thereby achieving persistent robustness. Extensive experiments on 11 SSL methods and 6 datasets validate its effectiveness. In certain cases, it achieves a 29.20% improvement in benign performance and a 73.86% gain in adversarial robustness, highlighting its zero-sacrifice property.

Zero-Sacrifice Persistent-Robustness Adversarial Defense for Pre-Trained Encoders

TL;DR

ZePAD addresses the vulnerability of public SSL encoders to downstream-agnostic adversarial examples by deploying a dual-branch defense that preserves or even improves benign performance while boosting adversarial robustness with a single adversarial tuning. A diverse MPAE-Branch and a benign BMP-Branch feed into a Robust Federal Decision Mechanism that fuses branch confidences, enabling persistent robustness across downstream tasks. Across 11 SSL methods and 6 datasets, ZePAD yields substantial gains in both Benign Accuracy and Robust Accuracy, and even enables DAE detection from confidence signals without extra training. The approach extends to multimodal encoders (e.g., CLIP), indicating broad applicability for zero-sacrifice defenses in practical, real-world settings.

Abstract

The widespread use of publicly available pre-trained encoders from self-supervised learning (SSL) has exposed a critical vulnerability: their susceptibility to downstream-agnostic adversarial examples (DAEs), which are crafted without knowledge of the downstream tasks but capable of misleading downstream models. While several defense methods have been explored recently, they rely primarily on task-specific adversarial fine-tuning, which inevitably limits generalizability and causes catastrophic forgetting and deteriorates benign performance. Different with previous works, we propose a more rigorous defense goal that requires only a single tuning for diverse downstream tasks to defend against DAEs and preserve benign performance. To achieve this defense goal, we introduce Zero-Sacrifice Persistent-Robustness Adversarial Defense (ZePAD), which is inspired by the inherent sensitivity of neural networks to data characteristics. Specifically, ZePAD is a dual-branch structure, which consists of a Multi-Pattern Adversarial Enhancement Branch (MPAE-Branch) that uses two adversarially fine-tuned encoders to strengthen adversarial resistance. The Benign Memory Preservation Branch (BMP-Branch) is trained on local data to ensure adversarial robustness does not compromise benign performance. Surprisingly, we find that ZePAD can directly detect DAEs by evaluating branch confidence, without introducing any adversarial exsample identification task during training. Notably, by enriching feature diversity, our method enables a single adversarial fine-tuning to defend against DAEs across downstream tasks, thereby achieving persistent robustness. Extensive experiments on 11 SSL methods and 6 datasets validate its effectiveness. In certain cases, it achieves a 29.20% improvement in benign performance and a 73.86% gain in adversarial robustness, highlighting its zero-sacrifice property.
Paper Structure (44 sections, 13 equations, 7 figures, 16 tables, 1 algorithm)

This paper contains 44 sections, 13 equations, 7 figures, 16 tables, 1 algorithm.

Figures (7)

  • Figure 1: The overview of the proposed method. The red and green arrows denote the attack flow and normal flow in the inference stage, respectively.
  • Figure 2: The confidence distributions of each branch under different conditions. C-C indicates fine-tuning on CIFAR10 and downstream testing on CIFAR10; C-S indicates fine-tuning on CIFAR10 and downstream testing on STL10.
  • Figure 3: t-SNE visualization of feature spaces for the baseline and ZePAD under benign and adversarial examples. Different colors indicate different classes.
  • Figure S1: BA (%) and RA (%) of our ZePAD and the baseline method under four attack methods in the white-box scenario.
  • Figure S2: BA (%) and RA (%) across different numbers of AdvAu-Models.
  • ...and 2 more figures