Fourier Analysis on Robustness of Graph Convolutional Neural Networks for Skeleton-based Action Recognition

Nariki Tanaka; Hiroshi Kera; Kazuhiko Kawamoto

Fourier Analysis on Robustness of Graph Convolutional Neural Networks for Skeleton-based Action Recognition

Nariki Tanaka, Hiroshi Kera, Kazuhiko Kawamoto

TL;DR

This paper investigates the robustness of graph convolutional networks (GCNs) for skeleton-based action recognition under adversarial attacks and common corruptions. It introduces the Joint Fourier Transform (JFT), combining Graph Fourier Transform and Discrete Fourier Transform, to analyze frequency-domain robustness and to visualize sensitivity via Fourier heatmaps on the NTU RGB+D dataset. A key finding is that adversarial training does not induce a robustness trade-off with low-frequency perturbations as observed in CNNs, though Fourier analysis cannot fully explain vulnerability to skeletal part occlusion. The work advances understanding of GCN robustness for skeletal action recognition and highlights limitations of frequency-based explanations, pointing to future directions including alternative analyses and transformer-era approaches for robustness assessment.

Abstract

Using Fourier analysis, we explore the robustness and vulnerability of graph convolutional neural networks (GCNs) for skeleton-based action recognition. We adopt a joint Fourier transform (JFT), a combination of the graph Fourier transform (GFT) and the discrete Fourier transform (DFT), to examine the robustness of adversarially-trained GCNs against adversarial attacks and common corruptions. Experimental results with the NTU RGB+D dataset reveal that adversarial training does not introduce a robustness trade-off between adversarial attacks and low-frequency perturbations, which typically occurs during image classification based on convolutional neural networks. This finding indicates that adversarial training is a practical approach to enhancing robustness against adversarial attacks and common corruptions in skeleton-based action recognition. Furthermore, we find that the Fourier approach cannot explain vulnerability against skeletal part occlusion corruption, which highlights its limitations. These findings extend our understanding of the robustness of GCNs, potentially guiding the development of more robust learning methods for skeleton-based action recognition.

Fourier Analysis on Robustness of Graph Convolutional Neural Networks for Skeleton-based Action Recognition

TL;DR

Abstract

Paper Structure (25 sections, 12 equations, 13 figures, 7 tables)

This paper contains 25 sections, 12 equations, 13 figures, 7 tables.

Introduction
Related work
Robustness of Skeleton-based Action Recognition
Fourier analysis of CNN-based image classification
Fourier Analysis for Skeleton-based Action Recognition
Spatiotemporal Graph for Skeletal Sequence Data
Standard & Adversarial Training
Discrete & Graph Fourier Transforms
Joint Fourier Transform and Fourier Heatmap
Experiment
Experimental Setting
Dataset
Model
Adversarial Attack
Adversarial Training
...and 10 more sections

Figures (13)

Figure 1: Flow of the joint Fourier transform (JFT) on skeletal sequence data, which encompasses both the graph Fourier transform (GFT) and the discrete Fourier transform (DFT). The GFT is first applied to the skeletal data at each frame, followed by the DFT.
Figure 2: Spatial low-pass (leftmost) and high-pass (left second) filtering are performed by masking the Fourier spectrum along the spatial frequency axis. Temporal low-pass (right second) and high-pass (rightmost) are performed by masking the Fourier spectrum along the temporal frequency axis.
Figure 3: Average Fourier spectrum over all tests skeleton data for joint, joint motion, bone, and bone motion features. The vertical and horizontal axes represent spatial and temporal frequency, respectively. From these figures, for example, we can see the Fourier spectrum of the joint feature (leftmost) is concentrated at low frequencies in the spatiotemporal frequency domain.
Figure 4: Adversarial examples generated by the $l_2$-PGD. Clean (blue) and adversarial (red) examples are superimposed for three example actions. These two almost overlap and are highly imperceptible. For each action, action labels before and after adversarial attacks are provided.
Figure 5: Fourier heatmaps of standard-trained (ST) and adversarially-trained (AT) GCNs. The top and bottom three rows display those of the ST-GCNs and TCA-GCNs, respectively, for each of the four features (joint, joint motion, bone, bone motion) and three perturbation norms $v\in\{0.5,1.5,3.0\}$.
...and 8 more figures

Fourier Analysis on Robustness of Graph Convolutional Neural Networks for Skeleton-based Action Recognition

TL;DR

Abstract

Fourier Analysis on Robustness of Graph Convolutional Neural Networks for Skeleton-based Action Recognition

Authors

TL;DR

Abstract

Table of Contents

Figures (13)