Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness
Tsubasa Takahashi, Shojiro Yamabe, Futa Waseda, Kento Sasaki
TL;DR
The paper investigates the adversarial robustness of Differential Attention (DA), revealing a fundamental Fragile Principle: DA’s subtractive structure can amplify sensitivity to small perturbations when the two attention branches exhibit negative gradient alignment. The authors develop a depth-aware analysis showing that stacking DA layers induces noise cancellation that can mitigate fragility for small perturbations, but this protection weakens under larger attacks. Empirical studies across ViT/DiffViT and DiffCLIP on multiple datasets corroborate higher attack success rates, stronger negative gradient alignment, and larger local Lipschitz constants for DA compared to standard attention. The work highlights a trade-off between discriminative focus and robustness, suggesting that deeper DA and complementary defenses may be needed to jointly achieve selectivity and stability in future attention mechanisms.
Abstract
Differential Attention (DA) has been proposed as a refinement to standard attention, suppressing redundant or noisy context through a subtractive structure and thereby reducing contextual hallucination. While this design sharpens task-relevant focus, we show that it also introduces a structural fragility under adversarial perturbations. Our theoretical analysis identifies negative gradient alignment-a configuration encouraged by DA's subtraction-as the key driver of sensitivity amplification, leading to increased gradient norms and elevated local Lipschitz constants. We empirically validate this Fragile Principle through systematic experiments on ViT/DiffViT and evaluations of pretrained CLIP/DiffCLIP, spanning five datasets in total. These results demonstrate higher attack success rates, frequent gradient opposition, and stronger local sensitivity compared to standard attention. Furthermore, depth-dependent experiments reveal a robustness crossover: stacking DA layers attenuates small perturbations via depth-dependent noise cancellation, though this protection fades under larger attack budgets. Overall, our findings uncover a fundamental trade-off: DA improves discriminative focus on clean inputs but increases adversarial vulnerability, underscoring the need to jointly design for selectivity and robustness in future attention mechanisms.
