Table of Contents
Fetching ...

How to Enhance Downstream Adversarial Robustness (almost) without Touching the Pre-Trained Foundation Model?

Meiqi Liu, Zhuoqun Huang, Yue Xing

TL;DR

The paper tackles the high computational cost of adversarial training for foundation models by proposing CRoPD, a robust auto-encoder that preprocesses downstream data without accessing foundation-model weights during training. It formalizes a bound showing the downstream adversarial loss can be controlled by the downstream clean loss plus a scaled adversarial contrastive loss, and it introduces a practical algorithm that optimizes reconstruction loss jointly with adversarial contrastive loss. Extensive experiments across CIFAR-2/10/100, SVHN, ImagenetTe, and Tiny-Imagenet demonstrate that CRoPD yields substantial improvements in downstream robustness under PGD and AutoAttack with only modest degradation in clean accuracy and far lower computational cost than full robust fine-tuning. This work provides a cost-efficient path to robust downstream performance and suggests that feature robustness learned via adversarial contrastive learning is a key driver of downstream resilience, with potential extensions to NLP settings.

Abstract

With the rise of powerful foundation models, a pre-training-fine-tuning paradigm becomes increasingly popular these days: A foundation model is pre-trained using a huge amount of data from various sources, and then the downstream users only need to fine-tune and adapt it to specific downstream tasks. However, due to the high computation complexity of adversarial training, it is not feasible to fine-tune the foundation model to improve its robustness on the downstream task. Observing the above challenge, we want to improve the downstream robustness without updating/accessing the weights in the foundation model. Inspired from existing literature in robustness inheritance (Kim et al., 2020), through theoretical investigation, we identify a close relationship between robust contrastive learning with the adversarial robustness of supervised learning. To further validate and utilize this theoretical insight, we design a simple-yet-effective robust auto-encoder as a data pre-processing method before feeding the data into the foundation model. The proposed approach has zero access to the foundation model when training the robust auto-encoder. Extensive experiments demonstrate the effectiveness of the proposed method in improving the robustness of downstream tasks, verifying the connection between the feature robustness (implied by small adversarial contrastive loss) and the robustness of the downstream task.

How to Enhance Downstream Adversarial Robustness (almost) without Touching the Pre-Trained Foundation Model?

TL;DR

The paper tackles the high computational cost of adversarial training for foundation models by proposing CRoPD, a robust auto-encoder that preprocesses downstream data without accessing foundation-model weights during training. It formalizes a bound showing the downstream adversarial loss can be controlled by the downstream clean loss plus a scaled adversarial contrastive loss, and it introduces a practical algorithm that optimizes reconstruction loss jointly with adversarial contrastive loss. Extensive experiments across CIFAR-2/10/100, SVHN, ImagenetTe, and Tiny-Imagenet demonstrate that CRoPD yields substantial improvements in downstream robustness under PGD and AutoAttack with only modest degradation in clean accuracy and far lower computational cost than full robust fine-tuning. This work provides a cost-efficient path to robust downstream performance and suggests that feature robustness learned via adversarial contrastive learning is a key driver of downstream resilience, with potential extensions to NLP settings.

Abstract

With the rise of powerful foundation models, a pre-training-fine-tuning paradigm becomes increasingly popular these days: A foundation model is pre-trained using a huge amount of data from various sources, and then the downstream users only need to fine-tune and adapt it to specific downstream tasks. However, due to the high computation complexity of adversarial training, it is not feasible to fine-tune the foundation model to improve its robustness on the downstream task. Observing the above challenge, we want to improve the downstream robustness without updating/accessing the weights in the foundation model. Inspired from existing literature in robustness inheritance (Kim et al., 2020), through theoretical investigation, we identify a close relationship between robust contrastive learning with the adversarial robustness of supervised learning. To further validate and utilize this theoretical insight, we design a simple-yet-effective robust auto-encoder as a data pre-processing method before feeding the data into the foundation model. The proposed approach has zero access to the foundation model when training the robust auto-encoder. Extensive experiments demonstrate the effectiveness of the proposed method in improving the robustness of downstream tasks, verifying the connection between the feature robustness (implied by small adversarial contrastive loss) and the robustness of the downstream task.

Paper Structure

This paper contains 34 sections, 4 theorems, 42 equations, 2 figures, 16 tables.

Key Result

Theorem 1

Assume for all $x$, the encoder $f_{\mathrm{en}}(x)$ generates a robust latent feature $z$ such that $\| f_{\mathrm{en}}(x^{\mathrm{adv}}) - f_{\mathrm{en}}(x) \| \leq \eta_1$, where $\eta_1$ is small, and for all pairs $(x_1, y_1)$, $(x_2, y_2)$ with $y_1 \neq y_2$, it holds that $\| f_{\mathrm{en} where $L_{\mathrm{con}}(f_{\mathrm{en}})$ is the robust contrastive loss defined in (eqn:Lcon).

Figures (2)

  • Figure 1: Use a robust auto-encoder to pre-process the downstream data. After obtaining the pre-trained foundation model, we use adversarial training to train a robust auto-encoder via leveraging adversarial contrastive loss. A robust auto-encoder is used to pre-process downstream data. These pre-processed inputs are then fed into the foundation model.
  • Figure 2: Sample image reconstructions of each dataset. Top row: original images, middle row: ARAE reconstructions, bottom row: CRoPD reconstructions. Columns correspond to different datasets. ARAE reconstructions are sharper as expected, while CRoPD reconstructions are purified and more robust for downstream tasks.

Theorems & Definitions (8)

  • Theorem 1
  • Proposition 1
  • proof : Proof of Theorem \ref{['thm:downstream']}
  • Proposition 2
  • proof : Proof of Proposition \ref{['pro:decoder_robust']}
  • Proposition 3
  • proof : Proof of Proposition \ref{['pro:down_lipschitz']}
  • proof : Proof of Proposition \ref{['pro:adv loss']}.