Table of Contents
Fetching ...

Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness

Tsubasa Takahashi, Shojiro Yamabe, Futa Waseda, Kento Sasaki

TL;DR

The paper investigates the adversarial robustness of Differential Attention (DA), revealing a fundamental Fragile Principle: DA’s subtractive structure can amplify sensitivity to small perturbations when the two attention branches exhibit negative gradient alignment. The authors develop a depth-aware analysis showing that stacking DA layers induces noise cancellation that can mitigate fragility for small perturbations, but this protection weakens under larger attacks. Empirical studies across ViT/DiffViT and DiffCLIP on multiple datasets corroborate higher attack success rates, stronger negative gradient alignment, and larger local Lipschitz constants for DA compared to standard attention. The work highlights a trade-off between discriminative focus and robustness, suggesting that deeper DA and complementary defenses may be needed to jointly achieve selectivity and stability in future attention mechanisms.

Abstract

Differential Attention (DA) has been proposed as a refinement to standard attention, suppressing redundant or noisy context through a subtractive structure and thereby reducing contextual hallucination. While this design sharpens task-relevant focus, we show that it also introduces a structural fragility under adversarial perturbations. Our theoretical analysis identifies negative gradient alignment-a configuration encouraged by DA's subtraction-as the key driver of sensitivity amplification, leading to increased gradient norms and elevated local Lipschitz constants. We empirically validate this Fragile Principle through systematic experiments on ViT/DiffViT and evaluations of pretrained CLIP/DiffCLIP, spanning five datasets in total. These results demonstrate higher attack success rates, frequent gradient opposition, and stronger local sensitivity compared to standard attention. Furthermore, depth-dependent experiments reveal a robustness crossover: stacking DA layers attenuates small perturbations via depth-dependent noise cancellation, though this protection fades under larger attack budgets. Overall, our findings uncover a fundamental trade-off: DA improves discriminative focus on clean inputs but increases adversarial vulnerability, underscoring the need to jointly design for selectivity and robustness in future attention mechanisms.

Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness

TL;DR

The paper investigates the adversarial robustness of Differential Attention (DA), revealing a fundamental Fragile Principle: DA’s subtractive structure can amplify sensitivity to small perturbations when the two attention branches exhibit negative gradient alignment. The authors develop a depth-aware analysis showing that stacking DA layers induces noise cancellation that can mitigate fragility for small perturbations, but this protection weakens under larger attacks. Empirical studies across ViT/DiffViT and DiffCLIP on multiple datasets corroborate higher attack success rates, stronger negative gradient alignment, and larger local Lipschitz constants for DA compared to standard attention. The work highlights a trade-off between discriminative focus and robustness, suggesting that deeper DA and complementary defenses may be needed to jointly achieve selectivity and stability in future attention mechanisms.

Abstract

Differential Attention (DA) has been proposed as a refinement to standard attention, suppressing redundant or noisy context through a subtractive structure and thereby reducing contextual hallucination. While this design sharpens task-relevant focus, we show that it also introduces a structural fragility under adversarial perturbations. Our theoretical analysis identifies negative gradient alignment-a configuration encouraged by DA's subtraction-as the key driver of sensitivity amplification, leading to increased gradient norms and elevated local Lipschitz constants. We empirically validate this Fragile Principle through systematic experiments on ViT/DiffViT and evaluations of pretrained CLIP/DiffCLIP, spanning five datasets in total. These results demonstrate higher attack success rates, frequent gradient opposition, and stronger local sensitivity compared to standard attention. Furthermore, depth-dependent experiments reveal a robustness crossover: stacking DA layers attenuates small perturbations via depth-dependent noise cancellation, though this protection fades under larger attack budgets. Overall, our findings uncover a fundamental trade-off: DA improves discriminative focus on clean inputs but increases adversarial vulnerability, underscoring the need to jointly design for selectivity and robustness in future attention mechanisms.

Paper Structure

This paper contains 34 sections, 8 theorems, 29 equations, 5 figures, 3 tables.

Key Result

Lemma 1

Let $\theta$ be the angle between the input gradients of $A_1$ and $A_2$. Then,

Figures (5)

  • Figure 1: Illustration of the Fragile Principle of Differential Attention. (a) On clean inputs, well-aligned attention maps cancel redundant focus, producing sharp and stable responses. (b) With adversarial perturbations, the gradients (red arrows) of the two attention branches may become negatively aligned, amplifying small input changes and leading to conflicting responses.
  • Figure 2: ASR under Adversarial Examples and Patches crafted with PGD. DiffViT and DiffCLIP generally exhibit higher or comparable ASR compared to standard attention, with the gap most pronounced in small-class datasets (CIFAR and COCO), while narrowing on large-scale ones (Imagenet).
  • Figure 3: Depth-dependent effects of DA. (a) (b) Under PGD and AutoAttack, DA is fragile at depth 1--2, but ASR drops with depth for $\epsilon{=}1/255$; at $\epsilon{=}4/255$ both converge to high ASR. (c) Under CW-L2, deeper models require larger perturbations to reach 100% ASR. (d) Mean local Lipschitz estimates over all layers rise with depth, indicating higher local sensitivity.
  • Figure 4: Mean Lipschitz estimates of attention layers under input perturbations. Models incorporating DA exhibit the highest values among all attention layers, particularly at layers with larger $\lambda$.
  • Figure 5: Frequency of negative gradient alignment ($\cos(\nabla A_1$,$\nabla A_2)<0$) at each layer.

Theorems & Definitions (14)

  • Lemma 1
  • Theorem 1: Sensitivity Amplification by Alignment
  • Theorem 2: Relative Sensitivity to Standard Attention
  • Theorem 3: Existence of Amplifying Perturbations
  • Lemma 2
  • Theorem 4: Depth-Dependent Sensitivity of Standard Attention vs. DA
  • Corollary 1: Crossover in Robustness
  • proof
  • proof
  • proof
  • ...and 4 more