Table of Contents
Fetching ...

InfoGCN++: Learning Representation by Predicting the Future for Online Human Skeleton-based Action Recognition

Seunggeun Chi, Hyung-gun Chi, Qixing Huang, Karthik Ramani

Abstract

Skeleton-based action recognition has made significant advancements recently, with models like InfoGCN showcasing remarkable accuracy. However, these models exhibit a key limitation: they necessitate complete action observation prior to classification, which constrains their applicability in real-time situations such as surveillance and robotic systems. To overcome this barrier, we introduce InfoGCN++, an innovative extension of InfoGCN, explicitly developed for online skeleton-based action recognition. InfoGCN++ augments the abilities of the original InfoGCN model by allowing real-time categorization of action types, independent of the observation sequence's length. It transcends conventional approaches by learning from current and anticipated future movements, thereby creating a more thorough representation of the entire sequence. Our approach to prediction is managed as an extrapolation issue, grounded on observed actions. To enable this, InfoGCN++ incorporates Neural Ordinary Differential Equations, a concept that lets it effectively model the continuous evolution of hidden states. Following rigorous evaluations on three skeleton-based action recognition benchmarks, InfoGCN++ demonstrates exceptional performance in online action recognition. It consistently equals or exceeds existing techniques, highlighting its significant potential to reshape the landscape of real-time action recognition applications. Consequently, this work represents a major leap forward from InfoGCN, pushing the limits of what's possible in online, skeleton-based action recognition. The code for InfoGCN++ is publicly available at https://github.com/stnoah1/infogcn2 for further exploration and validation.

InfoGCN++: Learning Representation by Predicting the Future for Online Human Skeleton-based Action Recognition

Abstract

Skeleton-based action recognition has made significant advancements recently, with models like InfoGCN showcasing remarkable accuracy. However, these models exhibit a key limitation: they necessitate complete action observation prior to classification, which constrains their applicability in real-time situations such as surveillance and robotic systems. To overcome this barrier, we introduce InfoGCN++, an innovative extension of InfoGCN, explicitly developed for online skeleton-based action recognition. InfoGCN++ augments the abilities of the original InfoGCN model by allowing real-time categorization of action types, independent of the observation sequence's length. It transcends conventional approaches by learning from current and anticipated future movements, thereby creating a more thorough representation of the entire sequence. Our approach to prediction is managed as an extrapolation issue, grounded on observed actions. To enable this, InfoGCN++ incorporates Neural Ordinary Differential Equations, a concept that lets it effectively model the continuous evolution of hidden states. Following rigorous evaluations on three skeleton-based action recognition benchmarks, InfoGCN++ demonstrates exceptional performance in online action recognition. It consistently equals or exceeds existing techniques, highlighting its significant potential to reshape the landscape of real-time action recognition applications. Consequently, this work represents a major leap forward from InfoGCN, pushing the limits of what's possible in online, skeleton-based action recognition. The code for InfoGCN++ is publicly available at https://github.com/stnoah1/infogcn2 for further exploration and validation.
Paper Structure (42 sections, 14 equations, 9 figures, 9 tables, 2 algorithms)

This paper contains 42 sections, 14 equations, 9 figures, 9 tables, 2 algorithms.

Figures (9)

  • Figure 1: A visual representation of the InfoGCN++ model. The InfoGCN++ model leverages Neural ODE to predict future movements from given observations, thereby forming comprehensive sequence representations. This anticipatory approach equips the model with the necessary discriminative information for swift and accurate action recognition.
  • Figure 2: Overview of proposed InfoGCN++. Given the representation of the observation $\mathbf{Z}_t$, the InfoGCN++ extrapolates the representation to future frames by solving the IVP to predict future motion. Learned representations by predicting the future are then used for classifying the action at a given observation. The detailed structures for the encoder and classification head are shown in \ref{['fig:architecture']}.
  • Figure 3: The detail architecture of the (a) SA-GC chi2022infogcn module, (b) the Encoder, (c) Future motion prediction decoder, and (d) Action classification decoder of infoGCN++.
  • Figure 4: The visualization of a representation $\mathbf{Z}_{t}$ (first row) and predicted future representations $\mathbf{\hat{Z}}^{(t)}_{t+1:t+N}$.
  • Figure 5: Visualization of different input modification strategies. Different colors represent different temporal positions. The white color indicates the zero-value padding.
  • ...and 4 more figures