Pose2Gest: A Few-Shot Model-Free Approach Applied In South Indian Classical Dance Gesture Recognition
Kavitha Raju, Nandini J. Warrier, Manu Madhavan, Selvi C., Arun B. Warrier, Thulasi Kumar
TL;DR
Pose2Gest addresses mudra recognition for Indian classical dance under severe data scarcity by a model-free pipeline that uses pose-estimation to build a $63$-dimensional hand-landmark vector, normalizes it with a transformation $T$ computed via $T = P \cdot S^{-1}$, stores reference vectors in a vector database, and performs classification through Euclidean similarity. It achieves up to $92\%$ accuracy on 24-class Hasta Mudra data and shows competitive performance on Kathakali and Bharatanatyam datasets without training a neural network. The work contributes a publicly released Hasta Mudra dataset, a web tool for crowd-sourced data collection, and demonstrates practical applicability to real-time input and sign-language tasks, highlighting data-efficient digitization of cultural heritage. By combining pose-based features with simple vector similarity, Pose2Gest enables scalable, low-data gesture recognition across related art forms and beyond, while paving the way for word-level interpretation and broader sign-language applications.
Abstract
The classical dances from India utilize a set of hand gestures known as Mudras, serving as the foundational elements of its posture vocabulary. Identifying these mudras represents a primary task in digitizing the dance performances. With Kathakali, a dance-drama, as the focus, this work addresses mudra recognition by framing it as a 24-class classification problem and proposes a novel vector-similarity-based approach leveraging pose estimation techniques. This method obviates the need for extensive training or fine-tuning, thus mitigating the issue of limited data availability common in similar AI applications. Achieving an accuracy rate of 92%, our approach demonstrates comparable or superior performance to existing model-training-based methodologies in this domain. Notably, it remains effective even with small datasets comprising just 1 or 5 samples, albeit with a slightly diminished performance. Furthermore, our system supports processing images, videos, and real-time streams, accommodating both hand-cropped and full-body images. As part of this research, we have curated and released a publicly accessible Hasta Mudra dataset, which applies to multiple South Indian art forms including Kathakali. The implementation of the proposed method is also made available as a web application.
