Physics-constrained Attack against Convolution-based Human Motion Prediction

Chengxu Duan; Zhicheng Zhang; Xiaoli Liu; Yonghao Dang; Jianqin Yin

Physics-constrained Attack against Convolution-based Human Motion Prediction

Chengxu Duan, Zhicheng Zhang, Xiaoli Liu, Yonghao Dang, Jianqin Yin

TL;DR

A new adversarial attack method is proposed that generates the worst-case perturbation by maximizing the human motion predictor's prediction error with physical constraints and facilitates the attack to suit the scale of the target pose and two physical constraints to enhance the naturalness of the adversarial example.

Abstract

Human motion prediction has achieved a brilliant performance with the help of convolution-based neural networks. However, currently, there is no work evaluating the potential risk in human motion prediction when facing adversarial attacks. The adversarial attack will encounter problems against human motion prediction in naturalness and data scale. To solve the problems above, we propose a new adversarial attack method that generates the worst-case perturbation by maximizing the human motion predictor's prediction error with physical constraints. Specifically, we introduce a novel adaptable scheme that facilitates the attack to suit the scale of the target pose and two physical constraints to enhance the naturalness of the adversarial example. The evaluating experiments on three datasets show that the prediction errors of all target models are enlarged significantly, which means current convolution-based human motion prediction models are vulnerable to the proposed attack. Based on the experimental results, we provide insights on how to enhance the adversarial robustness of the human motion predictor and how to improve the adversarial attack against human motion prediction.

Physics-constrained Attack against Convolution-based Human Motion Prediction

TL;DR

Abstract

Paper Structure (32 sections, 9 equations, 4 figures, 35 tables, 1 algorithm)

This paper contains 32 sections, 9 equations, 4 figures, 35 tables, 1 algorithm.

Introduction
Related Works
Convolution-based Human Motion Prediction
Other Human Motion Prediction
Adversarial Attack
Problem Formulation
Human Motion Prediction
Attack Model
Methodology
Optimization for Attack
Constraints for Perceptual Naturalness
Generation of Perturbations
Experiment
Experimental Settings
Datasets
...and 17 more sections

Figures (4)

Figure 1: An example of the proposed attack against human motion prediction. The poses above the red line are the clean input and output of human motion prediction; the poses below the red line are the perturbed input and output of the prediction, which is obvious that the difference between the clean data and the perturbed data is imperceptible while the disturbance in prediction is significant.
Figure 2: This figure shows LTD’s prediction result of “walking” before and after the attack. The poses with solid lines in red and blue are the clean input and ground truth; For the input sequences, the skeletons in purple and green are the perturbed sequence under the attack with the corresponding intensities written on the left. As for the output sequences, the dotted skeletons in the prediction are the predicted poses when their time intervals are 80ms, 160ms, 320ms, 400ms, 560ms, and 1000ms. Because the differences between the clean input and the perturbed input are tiny, we mark the poses with relatively noticeable changes with red boxes.
Figure 3: This figure shows the different models' prediction results of "sitting" before and after the input motion sequence is perturbed when $\epsilon$ is set as 0.01. The ground truth is shown at the top of the figure. Below the ground truth, the skeletons in red and blue are the clean predictions of the corresponding model named on the left, while the dotted skeletons in purple and green are the corresponding disturbed predictions. Because the output motion sequence is too long to fully display, the time intervals of the predicted poses from the left to the right shown in the figure are 80ms, 160ms, 320ms, 400ms, 560ms, 720ms, 880ms, and 1000ms.
Figure 4: This figure shows how to divide the input human motion sequence into the front, middle, rear, and last parts when the input length is 10 or 50 frames. The number of frames in each part is written under the poses of the corresponding parts.

Physics-constrained Attack against Convolution-based Human Motion Prediction

TL;DR

Abstract

Physics-constrained Attack against Convolution-based Human Motion Prediction

Authors

TL;DR

Abstract

Table of Contents

Figures (4)