E2E-GNet: An End-to-End Skeleton-based Geometric Deep Neural Network for Human Motion Recognition

Mubarak Olaoluwa; Hassen Drira

E2E-GNet: An End-to-End Skeleton-based Geometric Deep Neural Network for Human Motion Recognition

Mubarak Olaoluwa, Hassen Drira

TL;DR

This work proposes E2E-GNet, an end-to-end geometric deep neural network for skeleton-based human motion recognition, and introduces a geometric transformation layer that jointly optimizes skeleton motion sequences on this space and applies a differentiable logarithm map activation to project them onto a linear space.

Abstract

Geometric deep learning has recently gained significant attention in the computer vision community for its ability to capture meaningful representations of data lying in a non-Euclidean space. To this end, we propose E2E-GNet, an end-to-end geometric deep neural network for skeleton-based human motion recognition. To enhance the discriminative power between different motions in the non-Euclidean space, E2E-GNet introduces a geometric transformation layer that jointly optimizes skeleton motion sequences on this space and applies a differentiable logarithm map activation to project them onto a linear space. Building on this, we further design a distortion-aware optimization layer that limits skeleton shape distortions caused by this projection, enabling the network to retain discriminative geometric cues and achieve a higher motion recognition rate. We demonstrate the impact of each layer through ablation studies and extensive experiments across five datasets spanning three domains show that E2E-GNet outperforms other methods with lower cost.

E2E-GNet: An End-to-End Skeleton-based Geometric Deep Neural Network for Human Motion Recognition

TL;DR

Abstract

Paper Structure (22 sections, 11 equations, 4 figures, 17 tables)

This paper contains 22 sections, 11 equations, 4 figures, 17 tables.

Introduction
Related Work
Geometric Deep Learning
Skeleton-based Human Motion Recognition
Proposed End-to-End Geometric Network
Modeling of Sequences on Pre-shape Space
Geometric Transformation Layer (GTL)
Distortion Minimization Layer (DML)
Additional GTL and DML Variants
Feature Extraction and Classification
Experimental Datasets and Settings
Results and Discussion
Comparison with State-of-the-Art Methods
Ablation Studies
Proposed DML versus Parallel Transport (PT)
...and 7 more sections

Figures (4)

Figure 1: Illustration of proposed E2E-GNet. The input motion sequences are modeled onto the pre-shape space of a unit sphere. Then the geometric transformation layer (GTL) learns and applies optimal transforms to the sequences for improved feature extraction, with projection onto the tangent space via a log-map activation function. The distortion minimization layer (DML) then acts on these tangent representative skeletons to reduce projection-induced distortions. Finally, convolution and LSTM modules extract discriminative spatio-temporal features for classification, and the optimization is done in an end-to-end manner spanning both manifold and tangent spaces.
Figure 2: Five consecutive shapes of normal (left) and Alzheimer's subjects (right) for Bend Waist exercise in EHE dataset. Black values are geodesic distance between the current shape and previous.
Figure 3: Rotation difference between consecutive frames on NTU-60 X-Sub.
Figure 4: Visualization of DML effect on some representative skeleton shapes for 'pick up' action of NTU dataset. Red and blue values are geodesic distances before and after DML respectively.

E2E-GNet: An End-to-End Skeleton-based Geometric Deep Neural Network for Human Motion Recognition

TL;DR

Abstract

E2E-GNet: An End-to-End Skeleton-based Geometric Deep Neural Network for Human Motion Recognition

Authors

TL;DR

Abstract

Table of Contents

Figures (4)