Bones of Contention: Exploring Query-Efficient Attacks against Skeleton Recognition Systems
Yuxin Cao, Kai Ye, Derui Wang, Minhui Xue, Hao Ge, Chenxiong Qian, Jin Song Dong
TL;DR
This work exposes critical vulnerabilities in skeleton-based action recognition under black-box settings by introducing ISAAC-K, a query-efficient attack that uses Grad-CAM to locate key joints and constrained perturbations to preserve motion realism. It also reveals a surprising non-semantic weakness via ISAAC-N, where non-semantic joints can be replaced to mislead classifiers without relying on queries. The authors validate ISAAC-K against multiple datasets and models, show substantial efficiency gains over prior methods, and complement the numeric findings with a user study. They further propose four adaptive defenses to bolster robustness, highlighting the need for defense mechanisms tailored to the unique structure and dynamics of skeletal data.
Abstract
Skeleton action recognition models have secured more attention than video-based ones in various applications due to privacy preservation and lower storage requirements. Skeleton data are typically transmitted to cloud servers for action recognition, with results returned to clients via Apps/APIs. However, the vulnerability of skeletal models against adversarial perturbations gradually reveals the unreliability of these systems. Existing black-box attacks all operate in a decision-based manner, resulting in numerous queries that hinder efficiency and feasibility in real-world applications. Moreover, all attacks off the shelf focus on only restricted perturbations, while ignoring model weaknesses when encountered with non-semantic perturbations. In this paper, we propose two query-effIcient Skeletal Adversarial AttaCks, ISAAC-K and ISAAC-N. As a black-box attack, ISAAC-K utilizes Grad-CAM in a surrogate model to extract key joints where minor sparse perturbations are then added to fool the classifier. To guarantee natural adversarial motions, we introduce constraints of both bone length and temporal consistency. ISAAC-K finds stronger adversarial examples on the $\ell_\infty$ norm, which can encompass those on other norms. Exhaustive experiments substantiate that ISAAC-K can uplift the attack efficiency of the perturbations under 10 skeletal models. Additionally, as a byproduct, ISAAC-N fools the classifier by replacing skeletons unrelated to the action. We surprisingly find that skeletal models are vulnerable to large perturbations where the part-wise non-semantic joints are just replaced, leading to a query-free no-box attack without any prior knowledge. Based on that, four adaptive defenses are eventually proposed to improve the robustness of skeleton recognition models.
