InfoGCN++: Learning Representation by Predicting the Future for Online Human Skeleton-based Action Recognition

Seunggeun Chi; Hyung-gun Chi; Qixing Huang; Karthik Ramani

InfoGCN++: Learning Representation by Predicting the Future for Online Human Skeleton-based Action Recognition

Seunggeun Chi, Hyung-gun Chi, Qixing Huang, Karthik Ramani

Abstract

Skeleton-based action recognition has made significant advancements recently, with models like InfoGCN showcasing remarkable accuracy. However, these models exhibit a key limitation: they necessitate complete action observation prior to classification, which constrains their applicability in real-time situations such as surveillance and robotic systems. To overcome this barrier, we introduce InfoGCN++, an innovative extension of InfoGCN, explicitly developed for online skeleton-based action recognition. InfoGCN++ augments the abilities of the original InfoGCN model by allowing real-time categorization of action types, independent of the observation sequence's length. It transcends conventional approaches by learning from current and anticipated future movements, thereby creating a more thorough representation of the entire sequence. Our approach to prediction is managed as an extrapolation issue, grounded on observed actions. To enable this, InfoGCN++ incorporates Neural Ordinary Differential Equations, a concept that lets it effectively model the continuous evolution of hidden states. Following rigorous evaluations on three skeleton-based action recognition benchmarks, InfoGCN++ demonstrates exceptional performance in online action recognition. It consistently equals or exceeds existing techniques, highlighting its significant potential to reshape the landscape of real-time action recognition applications. Consequently, this work represents a major leap forward from InfoGCN, pushing the limits of what's possible in online, skeleton-based action recognition. The code for InfoGCN++ is publicly available at https://github.com/stnoah1/infogcn2 for further exploration and validation.

InfoGCN++: Learning Representation by Predicting the Future for Online Human Skeleton-based Action Recognition

Abstract

Paper Structure (42 sections, 14 equations, 9 figures, 9 tables, 2 algorithms)

This paper contains 42 sections, 14 equations, 9 figures, 9 tables, 2 algorithms.

Introduction
Related Works
Offline Skeleton-based Action Recognition
Skeleton-based Early Action Prediction
Neural Ordinary Differential Equation
Online Skeleton-based Action Recognition
Preliminaries
Initial Value Problem and NeuralODE
Self-Attention Graph Convolution.
InfoGCN++
Architecture overview
Embedding Layer
Encoder
Future Motion Predictor
Task-specific Decoders
...and 27 more sections

Figures (9)

Figure 1: A visual representation of the InfoGCN++ model. The InfoGCN++ model leverages Neural ODE to predict future movements from given observations, thereby forming comprehensive sequence representations. This anticipatory approach equips the model with the necessary discriminative information for swift and accurate action recognition.
Figure 2: Overview of proposed InfoGCN++. Given the representation of the observation $\mathbf{Z}_t$, the InfoGCN++ extrapolates the representation to future frames by solving the IVP to predict future motion. Learned representations by predicting the future are then used for classifying the action at a given observation. The detailed structures for the encoder and classification head are shown in \ref{['fig:architecture']}.
Figure 3: The detail architecture of the (a) SA-GC chi2022infogcn module, (b) the Encoder, (c) Future motion prediction decoder, and (d) Action classification decoder of infoGCN++.
Figure 4: The visualization of a representation $\mathbf{Z}_{t}$ (first row) and predicted future representations $\mathbf{\hat{Z}}^{(t)}_{t+1:t+N}$.
Figure 5: Visualization of different input modification strategies. Different colors represent different temporal positions. The white color indicates the zero-value padding.
...and 4 more figures

InfoGCN++: Learning Representation by Predicting the Future for Online Human Skeleton-based Action Recognition

Abstract

InfoGCN++: Learning Representation by Predicting the Future for Online Human Skeleton-based Action Recognition

Authors

Abstract

Table of Contents

Figures (9)