Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers

Weijie Zheng; Xingjun Ma; Hanxun Huang; Zuxuan Wu; Yu-Gang Jiang

Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers

Weijie Zheng, Xingjun Ma, Hanxun Huang, Zuxuan Wu, Yu-Gang Jiang

TL;DR

This work investigates the vulnerability transfer from large pre-trained Vision Transformers to downstream models via Downstream Transfer Attack (DTA). It introduces Average Token Cosine Similarity (ATCS) as both the attack objective and a transferability indicator, and develops a layer-aware strategy to identify the most transferable vulnerabilities across ViT layers. Empirical results across multiple pre-training methods, fine-tuning schemes, model scales, and downstream tasks show that DTA achieves high attack success rates (often >90%) and that PETL approaches like LoRA and AdaptFormer can exacerbate downstream fragility. The study also demonstrates that DTA-informed adversarial training can improve robustness against downstream transfer attacks, highlighting practical implications for defense in real-world deployments.

Abstract

With the advancement of vision transformers (ViTs) and self-supervised learning (SSL) techniques, pre-trained large ViTs have become the new foundation models for computer vision applications. However, studies have shown that, like convolutional neural networks (CNNs), ViTs are also susceptible to adversarial attacks, where subtle perturbations in the input can fool the model into making false predictions. This paper studies the transferability of such an adversarial vulnerability from a pre-trained ViT model to downstream tasks. We focus on \emph{sample-wise} transfer attacks and propose a novel attack method termed \emph{Downstream Transfer Attack (DTA)}. For a given test image, DTA leverages a pre-trained ViT model to craft the adversarial example and then applies the adversarial example to attack a fine-tuned version of the model on a downstream dataset. During the attack, DTA identifies and exploits the most vulnerable layers of the pre-trained model guided by a cosine similarity loss to craft highly transferable attacks. Through extensive experiments with pre-trained ViTs by 3 distinct pre-training methods, 3 fine-tuning schemes, and across 10 diverse downstream datasets, we show that DTA achieves an average attack success rate (ASR) exceeding 90\%, surpassing existing methods by a huge margin. When used with adversarial training, the adversarial examples generated by our DTA can significantly improve the model's robustness to different downstream transfer attacks.

Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers

TL;DR

Abstract

Paper Structure (34 sections, 4 equations, 8 figures, 8 tables, 1 algorithm)

This paper contains 34 sections, 4 equations, 8 figures, 8 tables, 1 algorithm.

Introduction
Related Work
Pre-training and Fine-tuning
Transferable Adversarial Attacks
Downstream Transfer Attacks
Downstream Transfer Attack
Notations
Threat Model
Relation to Existing Threat Models
Methodology
Average Token Cosine Similarity
Finding the Most Vulnerable Layers
Experiments
Experimental Setup
Models and Datasets
...and 19 more sections

Figures (8)

Figure 1: A conceptual illustration of downstream transfer attack.
Figure 2: ATCS vs. downstream transfer Attack Success Rate (ASR). The source ViT-base models were pre-trained by MAE, DINO, and AugReg, while their downstream (target) models were fully finetuned on CIFAR-10.
Figure 3: The ATCS values of adversarial examples generated at different steps of Eq. (\ref{['eq:optimize']}) for CIFAR-10 test images. Each line indicates attacking a particular layer of the pre-trained model. Figure \ref{['subfig:mae']}, \ref{['subfig:dino']}, and \ref{['subfig:augreg']} represent the source ViT-base models pre-trained by MAE, DINO, and AugReg on ImageNet, respectively. Their downstream models were fully fine-tuned on CIFAR-10.
Figure 4: The ASR(%) on adversarially pre-trained models. All downstream models are fine-tuned from the adversarial pre-trained XCiT-base model with $\epsilon = 8/255$ on ImageNet-1k.
Figure 5: The ASR(%) of different loss functions on AugReg, DINO, MAE pre-trained models. The attack layer was set to 1, 8, and 8 for MAE, DINO, and AugReg, respectively.
...and 3 more figures

Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers

TL;DR

Abstract

Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers

Authors

TL;DR

Abstract

Table of Contents

Figures (8)