Reproducibility Study on Adversarial Attacks Against Robust Transformer Trackers
Fatemeh Nourilenjan Nokabadi, Jean-François Lalonde, Christian Gagné
TL;DR
This study interrogates the reproducibility of adversarial attacks on transformer-based visual trackers across multiple benchmarks (VOT2022-ST, UAV123, GOT10k) and output modalities (bounding boxes and binary masks). It systematically applies four attacks (CSA, IoU, SPARK, RTAA) in white-box and black-box settings to both transformer and non-transformer trackers, revealing that binary masks are generally more susceptible and that white-box attacks are more potent on transformer outputs. Deeper transformer backbones with cross-attention (e.g., MixFormer variants, ROMTrack) can exhibit greater inherent robustness, but overall existing attacks do not fully break these models, highlighting a need for new attack methods tailored to modern trackers. The results offer practical guidance for designing more robust transformer trackers and emphasize the importance of developing stronger white-box and black-box adversarial techniques in tracking. The work also provides reproducible code to facilitate benchmarking and further research in adversarial robustness for tracking.
Abstract
New transformer networks have been integrated into object tracking pipelines and have demonstrated strong performance on the latest benchmarks. This paper focuses on understanding how transformer trackers behave under adversarial attacks and how different attacks perform on tracking datasets as their parameters change. We conducted a series of experiments to evaluate the effectiveness of existing adversarial attacks on object trackers with transformer and non-transformer backbones. We experimented on 7 different trackers, including 3 that are transformer-based, and 4 which leverage other architectures. These trackers are tested against 4 recent attack methods to assess their performance and robustness on VOT2022ST, UAV123 and GOT10k datasets. Our empirical study focuses on evaluating adversarial robustness of object trackers based on bounding box versus binary mask predictions, and attack methods at different levels of perturbations. Interestingly, our study found that altering the perturbation level may not significantly affect the overall object tracking results after the attack. Similarly, the sparsity and imperceptibility of the attack perturbations may remain stable against perturbation level shifts. By applying a specific attack on all transformer trackers, we show that new transformer trackers having a stronger cross-attention modeling achieve a greater adversarial robustness on tracking datasets, such as VOT2022ST and GOT10k. Our results also indicate the necessity for new attack methods to effectively tackle the latest types of transformer trackers. The codes necessary to reproduce this study are available at https://github.com/fatemehN/ReproducibilityStudy.
