Table of Contents
Fetching ...

Accurate online action and gesture recognition system using detectors and Deep SPD Siamese Networks

Mohamed Sanim Akremi, Rim Slama, Hedi Tabia

TL;DR

This paper addresses online skeleton-based action and gesture recognition by introducing an SPD matrix-based representation learned through a Siamese network, enabling continuous detection of motion intervals in unsegmented sequences. An online system couples a detector (for kinetic state transitions) with a verifier and a SPD-Siamese classifier, yielding early and accurate recognition while maintaining real-time performance. The approach advances previous offline manifold methods by providing a practical online pipeline with a partitioned body/hand representation, SPD learning via GA/ReEig/LogEig/VecMap, and a contrastive loss framework, followed by a $K$-NN decision rule. Empirical validation across five diverse datasets demonstrates high accuracy on segmented data and robust online performance, with competitive or superior results compared to state-of-the-art online methods, including efficient runtime metrics (e.g., ~69 ms per prediction on SHREC 2021). This work provides a scalable, real-time solution for online gesture and action recognition in real-world settings and points to future improvements in classifier architecture and industrial HRI contexts.

Abstract

Online continuous motion recognition is a hot topic of research since it is more practical in real life application cases. Recently, Skeleton-based approaches have become increasingly popular, demonstrating the power of using such 3D temporal data. However, most of these works have focused on segment-based recognition and are not suitable for the online scenarios. In this paper, we propose an online recognition system for skeleton sequence streaming composed from two main components: a detector and a classifier, which use a Semi-Positive Definite (SPD) matrix representation and a Siamese network. The powerful statistical representations for the skeletal data given by the SPD matrices and the learning of their semantic similarity by the Siamese network enable the detector to predict time intervals of the motions throughout an unsegmented sequence. In addition, they ensure the classifier capability to recognize the motion in each predicted interval. The proposed detector is flexible and able to identify the kinetic state continuously. We conduct extensive experiments on both hand gesture and body action recognition benchmarks to prove the accuracy of our online recognition system which in most cases outperforms state-of-the-art performances.

Accurate online action and gesture recognition system using detectors and Deep SPD Siamese Networks

TL;DR

This paper addresses online skeleton-based action and gesture recognition by introducing an SPD matrix-based representation learned through a Siamese network, enabling continuous detection of motion intervals in unsegmented sequences. An online system couples a detector (for kinetic state transitions) with a verifier and a SPD-Siamese classifier, yielding early and accurate recognition while maintaining real-time performance. The approach advances previous offline manifold methods by providing a practical online pipeline with a partitioned body/hand representation, SPD learning via GA/ReEig/LogEig/VecMap, and a contrastive loss framework, followed by a -NN decision rule. Empirical validation across five diverse datasets demonstrates high accuracy on segmented data and robust online performance, with competitive or superior results compared to state-of-the-art online methods, including efficient runtime metrics (e.g., ~69 ms per prediction on SHREC 2021). This work provides a scalable, real-time solution for online gesture and action recognition in real-world settings and points to future improvements in classifier architecture and industrial HRI contexts.

Abstract

Online continuous motion recognition is a hot topic of research since it is more practical in real life application cases. Recently, Skeleton-based approaches have become increasingly popular, demonstrating the power of using such 3D temporal data. However, most of these works have focused on segment-based recognition and are not suitable for the online scenarios. In this paper, we propose an online recognition system for skeleton sequence streaming composed from two main components: a detector and a classifier, which use a Semi-Positive Definite (SPD) matrix representation and a Siamese network. The powerful statistical representations for the skeletal data given by the SPD matrices and the learning of their semantic similarity by the Siamese network enable the detector to predict time intervals of the motions throughout an unsegmented sequence. In addition, they ensure the classifier capability to recognize the motion in each predicted interval. The proposed detector is flexible and able to identify the kinetic state continuously. We conduct extensive experiments on both hand gesture and body action recognition benchmarks to prove the accuracy of our online recognition system which in most cases outperforms state-of-the-art performances.

Paper Structure

This paper contains 18 sections, 6 figures, 7 tables.

Figures (6)

  • Figure 1: The proposed online motion recognition system.
  • Figure 2: The overview of the proposed SPD Siamese network.
  • Figure 3: Skeleton parts and matrix representation for (a) the hand and (b) the body.
  • Figure 4: ST-TS-HGR-NET architecure
  • Figure 5: Verification process with $te=5$.
  • ...and 1 more figures