Integrating Skeleton Based Representations for Robust Yoga Pose Classification Using Deep Learning Models
Mohammed Mohiuddin, Syed Mohammod Minhaz Hossain, Sumaiya Khanam, Prionkar Barua, Aparup Barua, MD Tamim Hossain
TL;DR
This work introduces Yoga-16, a balanced 16-pose dataset, and a comprehensive benchmark comparing direct-image and skeleton-based inputs across three CNN architectures. Skeleton-based representations, especially MediaPipe skeletons with VGG16, achieve the highest accuracy (96.09%), supported by Grad-CAM interpretability and cross-validation evidence of strong generalization. The study demonstrates the practical value of skeletal pose representations for robust yoga pose classification and provides a framework for evaluating real-time fitness applications. Collective findings highlight the potential of skeleton-focused pipelines to improve accuracy, robustness, and explainability in digital health and wellness tools.
Abstract
Yoga is a popular form of exercise worldwide due to its spiritual and physical health benefits, but incorrect postures can lead to injuries. Automated yoga pose classification has therefore gained importance to reduce reliance on expert practitioners. While human pose keypoint extraction models have shown high potential in action recognition, systematic benchmarking for yoga pose recognition remains limited, as prior works often focus solely on raw images or a single pose extraction model. In this study, we introduce a curated dataset, 'Yoga-16', which addresses limitations of existing datasets, and systematically evaluate three deep learning architectures (VGG16, ResNet50, and Xception), using three input modalities (direct images, MediaPipe Pose skeleton images, and YOLOv8 Pose skeleton images). Our experiments demonstrate that skeleton-based representations outperform raw image inputs, with the highest accuracy of 96.09% achieved by VGG16 with MediaPipe Pose skeleton input. Additionally, we provide interpretability analysis using Grad-CAM, offering insights into model decision-making for yoga pose classification with cross-validation analysis.
