Table of Contents
Fetching ...

E2E-GNet: An End-to-End Skeleton-based Geometric Deep Neural Network for Human Motion Recognition

Mubarak Olaoluwa, Hassen Drira

TL;DR

This work proposes E2E-GNet, an end-to-end geometric deep neural network for skeleton-based human motion recognition, and introduces a geometric transformation layer that jointly optimizes skeleton motion sequences on this space and applies a differentiable logarithm map activation to project them onto a linear space.

Abstract

Geometric deep learning has recently gained significant attention in the computer vision community for its ability to capture meaningful representations of data lying in a non-Euclidean space. To this end, we propose E2E-GNet, an end-to-end geometric deep neural network for skeleton-based human motion recognition. To enhance the discriminative power between different motions in the non-Euclidean space, E2E-GNet introduces a geometric transformation layer that jointly optimizes skeleton motion sequences on this space and applies a differentiable logarithm map activation to project them onto a linear space. Building on this, we further design a distortion-aware optimization layer that limits skeleton shape distortions caused by this projection, enabling the network to retain discriminative geometric cues and achieve a higher motion recognition rate. We demonstrate the impact of each layer through ablation studies and extensive experiments across five datasets spanning three domains show that E2E-GNet outperforms other methods with lower cost.

E2E-GNet: An End-to-End Skeleton-based Geometric Deep Neural Network for Human Motion Recognition

TL;DR

This work proposes E2E-GNet, an end-to-end geometric deep neural network for skeleton-based human motion recognition, and introduces a geometric transformation layer that jointly optimizes skeleton motion sequences on this space and applies a differentiable logarithm map activation to project them onto a linear space.

Abstract

Geometric deep learning has recently gained significant attention in the computer vision community for its ability to capture meaningful representations of data lying in a non-Euclidean space. To this end, we propose E2E-GNet, an end-to-end geometric deep neural network for skeleton-based human motion recognition. To enhance the discriminative power between different motions in the non-Euclidean space, E2E-GNet introduces a geometric transformation layer that jointly optimizes skeleton motion sequences on this space and applies a differentiable logarithm map activation to project them onto a linear space. Building on this, we further design a distortion-aware optimization layer that limits skeleton shape distortions caused by this projection, enabling the network to retain discriminative geometric cues and achieve a higher motion recognition rate. We demonstrate the impact of each layer through ablation studies and extensive experiments across five datasets spanning three domains show that E2E-GNet outperforms other methods with lower cost.
Paper Structure (22 sections, 11 equations, 4 figures, 17 tables)

This paper contains 22 sections, 11 equations, 4 figures, 17 tables.

Figures (4)

  • Figure 1: Illustration of proposed E2E-GNet. The input motion sequences are modeled onto the pre-shape space of a unit sphere. Then the geometric transformation layer (GTL) learns and applies optimal transforms to the sequences for improved feature extraction, with projection onto the tangent space via a log-map activation function. The distortion minimization layer (DML) then acts on these tangent representative skeletons to reduce projection-induced distortions. Finally, convolution and LSTM modules extract discriminative spatio-temporal features for classification, and the optimization is done in an end-to-end manner spanning both manifold and tangent spaces.
  • Figure 2: Five consecutive shapes of normal (left) and Alzheimer's subjects (right) for Bend Waist exercise in EHE dataset. Black values are geodesic distance between the current shape and previous.
  • Figure 3: Rotation difference between consecutive frames on NTU-60 X-Sub.
  • Figure 4: Visualization of DML effect on some representative skeleton shapes for 'pick up' action of NTU dataset. Red and blue values are geodesic distances before and after DML respectively.