Online hand gesture recognition using Continual Graph Transformers
Rim Slama, Wael Rabah, Hazem Wannous
TL;DR
This work tackles online, real-time hand gesture recognition from 3D hand skeleton sequences. It introduces CoSTrGCN, a hybrid architecture that first applies Spatial Graph Convolutional Networks to extract framewise spatial features and then a Transformer Graph Encoder to capture temporal dependencies, augmented by continual learning for streaming data. The authors demonstrate competitive performance on SHREC'21, achieving strong detection and Jaccard metrics while managing low false positives, and discuss the practical implications for human-robot interaction and assistive technologies. The approach is notable for its integration of continual inference with graph-based transformers, enabling robust, low-latency online gesture recognition in dynamic environments.
Abstract
Online continuous action recognition has emerged as a critical research area due to its practical implications in real-world applications, such as human-computer interaction, healthcare, and robotics. Among various modalities, skeleton-based approaches have gained significant popularity, demonstrating their effectiveness in capturing 3D temporal data while ensuring robustness to environmental variations. However, most existing works focus on segment-based recognition, making them unsuitable for real-time, continuous recognition scenarios. In this paper, we propose a novel online recognition system designed for real-time skeleton sequence streaming. Our approach leverages a hybrid architecture combining Spatial Graph Convolutional Networks (S-GCN) for spatial feature extraction and a Transformer-based Graph Encoder (TGE) for capturing temporal dependencies across frames. Additionally, we introduce a continual learning mechanism to enhance model adaptability to evolving data distributions, ensuring robust recognition in dynamic environments. We evaluate our method on the SHREC'21 benchmark dataset, demonstrating its superior performance in online hand gesture recognition. Our approach not only achieves state-of-the-art accuracy but also significantly reduces false positive rates, making it a compelling solution for real-time applications. The proposed system can be seamlessly integrated into various domains, including human-robot collaboration and assistive technologies, where natural and intuitive interaction is crucial.
