Table of Contents
Fetching ...

TASAR: Transfer-based Attack on Skeletal Action Recognition

Yunfeng Diao, Baiqi Wu, Ruixuan Zhang, Ajian Liu, Xiaoshuai Hao, Xingxing Wei, Meng Wang, He Wang

TL;DR

The paper addresses the vulnerability of skeletal action recognition (S-HAR) models to transfer-based adversarial attacks in black-box settings. It introduces TASAR, the first transfer-based attack tailored for S-HAR, built on a post-train Bayesian surrogate to smooth the loss landscape and a dual Bayesian optimization augmented with motion gradients to disrupt spatial-temporal coherence. Key contributions include the post-train Bayesian attack framework, the post-train dual Bayesian motion attack, and the RobustBenchHAR benchmark with extensive evaluations across models, datasets, and defenses. Empirically, TASAR achieves superior transferability and robustness compared to state-of-the-art attacks, highlighting its utility for robustness evaluation and the need for defenses against transferable adversarial perturbations in skeletal action recognition.

Abstract

Skeletal sequence data, as a widely employed representation of human actions, are crucial in Human Activity Recognition (HAR). Recently, adversarial attacks have been proposed in this area, which exposes potential security concerns, and more importantly provides a good tool for model robustness test. Within this research, transfer-based attack is an important tool as it mimics the real-world scenario where an attacker has no knowledge of the target model, but is under-explored in Skeleton-based HAR (S-HAR). Consequently, existing S-HAR attacks exhibit weak adversarial transferability and the reason remains largely unknown. In this paper, we investigate this phenomenon via the characterization of the loss function. We find that one prominent indicator of poor transferability is the low smoothness of the loss function. Led by this observation, we improve the transferability by properly smoothening the loss when computing the adversarial examples. This leads to the first Transfer-based Attack on Skeletal Action Recognition, TASAR. TASAR explores the smoothened model posterior of pre-trained surrogates, which is achieved by a new post-train Dual Bayesian optimization strategy. Furthermore, unlike existing transfer-based methods which overlook the temporal coherence within sequences, TASAR incorporates motion dynamics into the Bayesian attack, effectively disrupting the spatial-temporal coherence of S-HARs. For exhaustive evaluation, we build the first large-scale robust S-HAR benchmark, comprising 7 S-HAR models, 10 attack methods, 3 S-HAR datasets and 2 defense models. Extensive results demonstrate the superiority of TASAR. Our benchmark enables easy comparisons for future studies, with the code available in the https://github.com/yunfengdiao/Skeleton-Robustness-Benchmark.

TASAR: Transfer-based Attack on Skeletal Action Recognition

TL;DR

The paper addresses the vulnerability of skeletal action recognition (S-HAR) models to transfer-based adversarial attacks in black-box settings. It introduces TASAR, the first transfer-based attack tailored for S-HAR, built on a post-train Bayesian surrogate to smooth the loss landscape and a dual Bayesian optimization augmented with motion gradients to disrupt spatial-temporal coherence. Key contributions include the post-train Bayesian attack framework, the post-train dual Bayesian motion attack, and the RobustBenchHAR benchmark with extensive evaluations across models, datasets, and defenses. Empirically, TASAR achieves superior transferability and robustness compared to state-of-the-art attacks, highlighting its utility for robustness evaluation and the need for defenses against transferable adversarial perturbations in skeletal action recognition.

Abstract

Skeletal sequence data, as a widely employed representation of human actions, are crucial in Human Activity Recognition (HAR). Recently, adversarial attacks have been proposed in this area, which exposes potential security concerns, and more importantly provides a good tool for model robustness test. Within this research, transfer-based attack is an important tool as it mimics the real-world scenario where an attacker has no knowledge of the target model, but is under-explored in Skeleton-based HAR (S-HAR). Consequently, existing S-HAR attacks exhibit weak adversarial transferability and the reason remains largely unknown. In this paper, we investigate this phenomenon via the characterization of the loss function. We find that one prominent indicator of poor transferability is the low smoothness of the loss function. Led by this observation, we improve the transferability by properly smoothening the loss when computing the adversarial examples. This leads to the first Transfer-based Attack on Skeletal Action Recognition, TASAR. TASAR explores the smoothened model posterior of pre-trained surrogates, which is achieved by a new post-train Dual Bayesian optimization strategy. Furthermore, unlike existing transfer-based methods which overlook the temporal coherence within sequences, TASAR incorporates motion dynamics into the Bayesian attack, effectively disrupting the spatial-temporal coherence of S-HARs. For exhaustive evaluation, we build the first large-scale robust S-HAR benchmark, comprising 7 S-HAR models, 10 attack methods, 3 S-HAR datasets and 2 defense models. Extensive results demonstrate the superiority of TASAR. Our benchmark enables easy comparisons for future studies, with the code available in the https://github.com/yunfengdiao/Skeleton-Robustness-Benchmark.
Paper Structure (34 sections, 20 equations, 9 figures, 8 tables, 1 algorithm)

This paper contains 34 sections, 20 equations, 9 figures, 8 tables, 1 algorithm.

Figures (9)

  • Figure 1: A high-level illustration of our proposed method. Results marked with a 'check mark' ($\surd$) indicate superior performance compared to those marked with a 'cross' ($\times$). Spatial attack: treats each frame independently. Spatial-temporal Attack: integrates temporal motion gradients to disrupt the spatial-temporal coherence of S-HAR models.
  • Figure 2: Comparison of loss landscapes of trained models.The $x$ and $y$ axis represent two random direction vectors sampled from a Gaussian distribution, which are added to the model’s parameter space along these directions. These random direction vectors are used to assess the sensitivity of the model's loss function. The $z$ axis represents the loss value. More details can be found in loss_landscape. BA means the Bayesian Attack proposed by BA. PB means the post-train Bayesian optimization, and P-DB means the improved post-train Dual Bayesian optimization. The loss landscape optimized by post-train Dual Bayesian is significantly smoother than those of vanilla post-train Bayesian and baseline methods. More visualizations can be found in Appendix \ref{['Additional Experimental']}.
  • Figure 3: Comparisons with ensemble and Bayesian attacks. We calculate the model size and evaluate the average white-box (WASR) and black-box attack success rate (BASR) on the HDM05, NTU60, and NTU120 datasets, respectively.
  • Figure 4: The ground truth label 'Throw' can be misclassified as 'Lie down' on targeted attack by TASAR. The semantic differences between ground truth labels and target labels are large.
  • Figure 5: The ablation experiments of motion gradient. ‘MG’/‘No MG’ means whether using motion gradient in TASAR.
  • ...and 4 more figures