HaGRIDv2: 1M Images for Static and Dynamic Hand Gesture Recognition
Anton Nuzhdin, Alexander Nagaev, Alexander Sautin, Alexander Kapitanov, Karina Kvanchiani
TL;DR
HaGRIDv2 tackles the need for a comprehensive, large-scale hand gesture dataset suitable for both static and dynamic recognition in real-world HCI scenarios like video conferencing and home automation. It introduces 15 new static gestures and a diversified 'no gesture' class, plus an extended dynamic gesture recognition algorithm that supports swipes, zooms, clicks, and drag-and-drops, all while maintaining lightweight CPU-friendly inference. The work demonstrates improved cross-dataset generalization, stronger pre-training benefits, and enhanced gesture generation quality via diffusion models, supported by extensive ablations and cross-dataset evaluations. By releasing HaGRIDv2, pre-trained models, and the dynamic gesture algorithm, the study provides a practical, scalable resource for developing robust gesture-based interfaces on edge devices.
Abstract
This paper proposes the second version of the widespread Hand Gesture Recognition dataset HaGRID -- HaGRIDv2. We cover 15 new gestures with conversation and control functions, including two-handed ones. Building on the foundational concepts proposed by HaGRID's authors, we implemented the dynamic gesture recognition algorithm and further enhanced it by adding three new groups of manipulation gestures. The ``no gesture" class was diversified by adding samples of natural hand movements, which allowed us to minimize false positives by 6 times. Combining extra samples with HaGRID, the received version outperforms the original in pre-training models for gesture-related tasks. Besides, we achieved the best generalization ability among gesture and hand detection datasets. In addition, the second version enhances the quality of the gestures generated by the diffusion model. HaGRIDv2, pre-trained models, and a dynamic gesture recognition algorithm are publicly available.
