Understanding the Vulnerability of Skeleton-based Human Activity Recognition via Black-box Attack

Yunfeng Diao; He Wang; Tianjia Shao; Yong-Liang Yang; Kun Zhou; David Hogg; Meng Wang

Understanding the Vulnerability of Skeleton-based Human Activity Recognition via Black-box Attack

Yunfeng Diao, He Wang, Tianjia Shao, Yong-Liang Yang, Kun Zhou, David Hogg, Meng Wang

TL;DR

This work shows that skeleton-based HAR systems are vulnerable to true black-box adversarial attacks by introducing BASAR, a method that crafts on-manifold adversarial motions via Guided Manifold Walk. It demonstrates that on-manifold adversaries are common and perceptually plausible, challenging the assumption that perturbations must lie off the data manifold. To counter this threat, the authors propose Mixed On-manifold Adversarial Training (MMAT), which jointly leverages on- and off-manifold adversarial samples to improve both accuracy and robustness without sacrificing performance on clean data. The approach is validated across multiple models and large-scale datasets, supported by perceptual studies, and shows practical implications for the security of HAR systems as well as a general defense framework for time-series tasks.

Abstract

Human Activity Recognition (HAR) has been employed in a wide range of applications, e.g. self-driving cars, where safety and lives are at stake. Recently, the robustness of skeleton-based HAR methods have been questioned due to their vulnerability to adversarial attacks. However, the proposed attacks require the full-knowledge of the attacked classifier, which is overly restrictive. In this paper, we show such threats indeed exist, even when the attacker only has access to the input/output of the model. To this end, we propose the very first black-box adversarial attack approach in skeleton-based HAR called BASAR. BASAR explores the interplay between the classification boundary and the natural motion manifold. To our best knowledge, this is the first time data manifold is introduced in adversarial attacks on time series. Via BASAR, we find on-manifold adversarial samples are extremely deceitful and rather common in skeletal motions, in contrast to the common belief that adversarial samples only exist off-manifold. Through exhaustive evaluation, we show that BASAR can deliver successful attacks across classifiers, datasets, and attack modes. By attack, BASAR helps identify the potential causes of the model vulnerability and provides insights on possible improvements. Finally, to mitigate the newly identified threat, we propose a new adversarial training approach by leveraging the sophisticated distributions of on/off-manifold adversarial samples, called mixed manifold-based adversarial training (MMAT). MMAT can successfully help defend against adversarial attacks without compromising classification accuracy.

Understanding the Vulnerability of Skeleton-based Human Activity Recognition via Black-box Attack

TL;DR

Abstract

Paper Structure (33 sections, 12 equations, 18 figures, 11 tables, 1 algorithm)

This paper contains 33 sections, 12 equations, 18 figures, 11 tables, 1 algorithm.

Introduction
Related Work
Skeleton-based Activity Recognition
Adversarial Attack
Adversarial Training
Methodology
Guided Manifold Walk
Random Exploration
Aimed Probing
Manifold Projection
Mixed On-manifold Adversarial Training
Attack Experiments
Settings
Evaluation Metrics
Attack Evaluation
...and 18 more sections

Figures (18)

Figure 1: An abstract 2D illustration of BASAR. $\textbf{x}$ is the attacked motion. $\textbf{x}'_k$ is the ideal adversarial sample in iteration $k$. $\mathcal{M}$ (black line) is the natural pose manifold and $\partial C_{\mathbf{x}}$ (blue line) is the class boundary of $c_{\mathbf{x}}$. $\textbf{x}'_{k-1}$ is the result of last iteration. $\Tilde{\textbf{x}}'_k$ is the intermediate result of the current iteration.
Figure 2: The visual comparison with BA. The first row is the clean motion labeled as 'Squeeze'. The second row is the adversarial motion generated by BASAR and misclassified as 'Vomiting'. The third row is the adversarial motion generated by BA and misclassified as 'Vomiting'.
Figure 3: Deviation distributions of on/off-manifold adversarial samples (attacking ST-GCN) on HDM05 dataset.
Figure 4: Metrics versus number of queries on HDM05 with STGCN, MSG3D and SGN. UA/TA refers to Untargeted Attack/Targeted Attack.
Figure 5: Metrics versus number of queries on NTU with STGCN, MSG3D and SGN.
...and 13 more figures

Understanding the Vulnerability of Skeleton-based Human Activity Recognition via Black-box Attack

TL;DR

Abstract

Understanding the Vulnerability of Skeleton-based Human Activity Recognition via Black-box Attack

Authors

TL;DR

Abstract

Table of Contents

Figures (18)