Table of Contents
Fetching ...

Local Spherical Harmonics Improve Skeleton-Based Hand Action Recognition

Katharina Prasse, Steffen Jung, Yuxuan Zhou, Margret Keuper

TL;DR

This work targets fine-grained hand action recognition within skeleton-based models, addressing inter-subject variability and viewpoint changes by introducing Local Spherical Harmonics representations (LSHR/LSHT) computed from local hand joint neighborhoods. By converting local joint relations into spherical coordinates and encoding them with SH basis functions, the method yields rotation-invariant or rotation-robust features that are concatenated with the standard Cartesian input. Evaluations on FPHA and NTU RGB+D 120 with both GCN-BL and CTR-GCN backbones show consistent improvements, with magnitude-based LSHT often offering the strongest gains in rotation-invariant settings, especially for hand-focused actions. The results demonstrate enhanced robustness and accuracy for hand action recognition, and the authors provide code to facilitate adoption and extension to additional modalities.

Abstract

Hand action recognition is essential. Communication, human-robot interactions, and gesture control are dependent on it. Skeleton-based action recognition traditionally includes hands, which belong to the classes which remain challenging to correctly recognize to date. We propose a method specifically designed for hand action recognition which uses relative angular embeddings and local Spherical Harmonics to create novel hand representations. The use of Spherical Harmonics creates rotation-invariant representations which make hand action recognition even more robust against inter-subject differences and viewpoint changes. We conduct extensive experiments on the hand joints in the First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations, and on the NTU RGB+D 120 dataset, demonstrating the benefit of using Local Spherical Harmonics Representations. Our code is available at https://github.com/KathPra/LSHR_LSHT.

Local Spherical Harmonics Improve Skeleton-Based Hand Action Recognition

TL;DR

This work targets fine-grained hand action recognition within skeleton-based models, addressing inter-subject variability and viewpoint changes by introducing Local Spherical Harmonics representations (LSHR/LSHT) computed from local hand joint neighborhoods. By converting local joint relations into spherical coordinates and encoding them with SH basis functions, the method yields rotation-invariant or rotation-robust features that are concatenated with the standard Cartesian input. Evaluations on FPHA and NTU RGB+D 120 with both GCN-BL and CTR-GCN backbones show consistent improvements, with magnitude-based LSHT often offering the strongest gains in rotation-invariant settings, especially for hand-focused actions. The results demonstrate enhanced robustness and accuracy for hand action recognition, and the authors provide code to facilitate adoption and extension to additional modalities.

Abstract

Hand action recognition is essential. Communication, human-robot interactions, and gesture control are dependent on it. Skeleton-based action recognition traditionally includes hands, which belong to the classes which remain challenging to correctly recognize to date. We propose a method specifically designed for hand action recognition which uses relative angular embeddings and local Spherical Harmonics to create novel hand representations. The use of Spherical Harmonics creates rotation-invariant representations which make hand action recognition even more robust against inter-subject differences and viewpoint changes. We conduct extensive experiments on the hand joints in the First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations, and on the NTU RGB+D 120 dataset, demonstrating the benefit of using Local Spherical Harmonics Representations. Our code is available at https://github.com/KathPra/LSHR_LSHT.
Paper Structure (19 sections, 2 equations, 4 figures, 14 tables)

This paper contains 19 sections, 2 equations, 4 figures, 14 tables.

Figures (4)

  • Figure 1: Hands should receive particular emphasis as they contain the highest joint density and their mutual interaction is key to recognising hand actions. Hand joints (left) are first depicted as local spherical coordinates (middle) before being represented in terms of their Local Spherical Harmonics (right). The hand joint representation is then fed into the model as additional input.
  • Figure 2: Conversion between Cartesian global coordinates (left) and local spherical coordinates (right), where all coordinates are computed relative to each other joint. The global coordinates are Cartesian, while the local coordinates are spherical.
  • Figure 3: (a) Conversion between Cartesian coordinates ($x,y,z$) and spherical coordinates ($r, \theta, \phi$). Spherical coordinates consist of the radius r, the polar angle $\theta$ and the azimuthal angle $\phi$; (b) Visualization of the real part of Spherical Harmonics, where red indicates positive values while blue indicates negative values. The distance from the origin visualizes the magnitude of the Spherical Harmonics in the respective angular direction.
  • Figure 4: Visualization of hand joints in First-Person Hand Action Benchmark FirstPersonAction_CVPR2018 (own visualization).