Table of Contents
Fetching ...

DeSparsify: Adversarial Attack Against Token Sparsification Mechanisms in Vision Transformers

Oryan Yehezkel, Alon Zolfi, Amit Baras, Yuval Elovici, Asaf Shabtai

TL;DR

DeSparsify is presented, an attack targeting the availability of vision transformers that use token sparsification mechanisms that aims to exhaust the operating system's resources, while maintaining its stealthiness.

Abstract

Vision transformers have contributed greatly to advancements in the computer vision domain, demonstrating state-of-the-art performance in diverse tasks (e.g., image classification, object detection). However, their high computational requirements grow quadratically with the number of tokens used. Token sparsification mechanisms have been proposed to address this issue. These mechanisms employ an input-dependent strategy, in which uninformative tokens are discarded from the computation pipeline, improving the model's efficiency. However, their dynamism and average-case assumption makes them vulnerable to a new threat vector - carefully crafted adversarial examples capable of fooling the sparsification mechanism, resulting in worst-case performance. In this paper, we present DeSparsify, an attack targeting the availability of vision transformers that use token sparsification mechanisms. The attack aims to exhaust the operating system's resources, while maintaining its stealthiness. Our evaluation demonstrates the attack's effectiveness on three token sparsification mechanisms and examines the attack's transferability between them and its effect on the GPU resources. To mitigate the impact of the attack, we propose various countermeasures.

DeSparsify: Adversarial Attack Against Token Sparsification Mechanisms in Vision Transformers

TL;DR

DeSparsify is presented, an attack targeting the availability of vision transformers that use token sparsification mechanisms that aims to exhaust the operating system's resources, while maintaining its stealthiness.

Abstract

Vision transformers have contributed greatly to advancements in the computer vision domain, demonstrating state-of-the-art performance in diverse tasks (e.g., image classification, object detection). However, their high computational requirements grow quadratically with the number of tokens used. Token sparsification mechanisms have been proposed to address this issue. These mechanisms employ an input-dependent strategy, in which uninformative tokens are discarded from the computation pipeline, improving the model's efficiency. However, their dynamism and average-case assumption makes them vulnerable to a new threat vector - carefully crafted adversarial examples capable of fooling the sparsification mechanism, resulting in worst-case performance. In this paper, we present DeSparsify, an attack targeting the availability of vision transformers that use token sparsification mechanisms. The attack aims to exhaust the operating system's resources, while maintaining its stealthiness. Our evaluation demonstrates the attack's effectiveness on three token sparsification mechanisms and examines the attack's transferability between them and its effect on the GPU resources. To mitigate the impact of the attack, we propose various countermeasures.
Paper Structure (27 sections, 10 equations, 6 figures, 15 tables)

This paper contains 27 sections, 10 equations, 6 figures, 15 tables.

Figures (6)

  • Figure 1: Token depth distribution in terms of transformer blocks for a clean (top) and adversarial (bottom) image for three TS mechanisms (b)-(d). The colors indicate the maximum depth each token reaches before being discarded. The adversarial image is crafted using the single-image attack variant (Section \ref{['subsec:method:threat_model']}), which results in worst-case performance.
  • Figure 2: Distribution of the (a) tokens, (b) attention heads, and (c) blocks for the AdaViT mechanism when tested on clean and adversarial (single-image variant) images.
  • Figure 3: Distribution of activated tokens in each ATS block on clean and adversarial images.
  • Figure 4: GFLOPS transferability results for the single-image variant attack. Ensemble refers to perturbations that were trained on all modules simultaneously.
  • Figure 5: Ablation study examining the effect of the $\lambda$ value on the GFLOPS and accuracy for the ATS.
  • ...and 1 more figures