Local Spherical Harmonics Improve Skeleton-Based Hand Action Recognition
Katharina Prasse, Steffen Jung, Yuxuan Zhou, Margret Keuper
TL;DR
This work targets fine-grained hand action recognition within skeleton-based models, addressing inter-subject variability and viewpoint changes by introducing Local Spherical Harmonics representations (LSHR/LSHT) computed from local hand joint neighborhoods. By converting local joint relations into spherical coordinates and encoding them with SH basis functions, the method yields rotation-invariant or rotation-robust features that are concatenated with the standard Cartesian input. Evaluations on FPHA and NTU RGB+D 120 with both GCN-BL and CTR-GCN backbones show consistent improvements, with magnitude-based LSHT often offering the strongest gains in rotation-invariant settings, especially for hand-focused actions. The results demonstrate enhanced robustness and accuracy for hand action recognition, and the authors provide code to facilitate adoption and extension to additional modalities.
Abstract
Hand action recognition is essential. Communication, human-robot interactions, and gesture control are dependent on it. Skeleton-based action recognition traditionally includes hands, which belong to the classes which remain challenging to correctly recognize to date. We propose a method specifically designed for hand action recognition which uses relative angular embeddings and local Spherical Harmonics to create novel hand representations. The use of Spherical Harmonics creates rotation-invariant representations which make hand action recognition even more robust against inter-subject differences and viewpoint changes. We conduct extensive experiments on the hand joints in the First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations, and on the NTU RGB+D 120 dataset, demonstrating the benefit of using Local Spherical Harmonics Representations. Our code is available at https://github.com/KathPra/LSHR_LSHT.
