Table of Contents
Fetching ...

Task-Agnostic Attacks Against Vision Foundation Models

Brian Pulfer, Yury Belousov, Vitaliy Kinakh, Teddy Furon, Slava Voloshynovskiy

TL;DR

This work addresses the security of Vision Foundation Models by introducing Task-Agnostic Attacks (TAA) that perturb backbone feature representations rather than task-specific outputs. The authors formulate TAAs in feature space using mean-centered cosine similarity losses and optimize with Projected Gradient Descent under a distortion budget, evaluating their impact across a wide range of downstream tasks including classification, segmentation, retrieval, zero-shot tasks, and VLM-based captioning/VQA. Key findings show that TAAs can achieve substantial cross-task degradation, with relative efficiency approaching that of Task-Specific Attacks for several backbones, while transferability across unseen models remains limited but non-negligible. The results highlight significant cross-task vulnerabilities of VFMs and underscore the need for defenses that address non-task-specific perturbations, such as adversarial training, to safeguard open-source foundation-model ecosystems in real-world applications.

Abstract

The study of security in machine learning mainly focuses on downstream task-specific attacks, where the adversarial example is obtained by optimizing a loss function specific to the downstream task. At the same time, it has become standard practice for machine learning practitioners to adopt publicly available pre-trained vision foundation models, effectively sharing a common backbone architecture across a multitude of applications such as classification, segmentation, depth estimation, retrieval, question-answering and more. The study of attacks on such foundation models and their impact to multiple downstream tasks remains vastly unexplored. This work proposes a general framework that forges task-agnostic adversarial examples by maximally disrupting the feature representation obtained with foundation models. We extensively evaluate the security of the feature representations obtained by popular vision foundation models by measuring the impact of this attack on multiple downstream tasks and its transferability between models.

Task-Agnostic Attacks Against Vision Foundation Models

TL;DR

This work addresses the security of Vision Foundation Models by introducing Task-Agnostic Attacks (TAA) that perturb backbone feature representations rather than task-specific outputs. The authors formulate TAAs in feature space using mean-centered cosine similarity losses and optimize with Projected Gradient Descent under a distortion budget, evaluating their impact across a wide range of downstream tasks including classification, segmentation, retrieval, zero-shot tasks, and VLM-based captioning/VQA. Key findings show that TAAs can achieve substantial cross-task degradation, with relative efficiency approaching that of Task-Specific Attacks for several backbones, while transferability across unseen models remains limited but non-negligible. The results highlight significant cross-task vulnerabilities of VFMs and underscore the need for defenses that address non-task-specific perturbations, such as adversarial training, to safeguard open-source foundation-model ecosystems in real-world applications.

Abstract

The study of security in machine learning mainly focuses on downstream task-specific attacks, where the adversarial example is obtained by optimizing a loss function specific to the downstream task. At the same time, it has become standard practice for machine learning practitioners to adopt publicly available pre-trained vision foundation models, effectively sharing a common backbone architecture across a multitude of applications such as classification, segmentation, depth estimation, retrieval, question-answering and more. The study of attacks on such foundation models and their impact to multiple downstream tasks remains vastly unexplored. This work proposes a general framework that forges task-agnostic adversarial examples by maximally disrupting the feature representation obtained with foundation models. We extensively evaluate the security of the feature representations obtained by popular vision foundation models by measuring the impact of this attack on multiple downstream tasks and its transferability between models.

Paper Structure

This paper contains 34 sections, 2 equations, 10 figures, 11 tables, 1 algorithm.

Figures (10)

  • Figure 1: Adversarial example attacks on DinoV2 ViT-S model. The Task-Agnostic Attack deludes both the segmentation and the classification, contrary to attacks specific to a downstream task.
  • Figure 2: Schematic representation of classic Task-Specific Attack (left) and proposed Task-Agnostic Attack (right).
  • Figure 3: Our TAA deludes Segment-Anything-Model kirillov2023segment. Original (left) and adversarial (right, PSNR = 40 dB) images.
  • Figure 4: Examples of regular (left) and adversarial (right) captions obtained with TAAs attacking the VFM of PaliGemma. PSNR is 40 dB.
  • Figure 5: Comparison of relative efficiency \ref{['eq:RelEff']} of TAAs (right) with respect to TSAs (left) averaged over classification and semantic segmentation tasks. TAAs perform comparably to TSAs across models and tasks.
  • ...and 5 more figures