Table of Contents
Fetching ...

Sign language recognition from skeletal data using graph and recurrent neural networks

B. Mederos, J. Mejía, A. Medina-Reyes, Y. Espinosa-Almeyda, J. D. Díaz-Roman, I. Rodríguez-Mederos, M. Mejía-Carreon, F. Gonzalez-Lopez

TL;DR

The paper addresses isolated sign language recognition from skeletal data by proposing a Graph-GRU temporal network that combines graph neural networks for spatial pose modeling with gated recurrent units for temporal dynamics. The model uses a sequence-of-graphs input, stacked spatio-temporal blocks with residual connections, and a temporal attention mechanism to produce a compact representation feeding a 200-class classifier. Evaluated on the AUTSL dataset with PoseNet-derived 2D skeletons, the approach achieves about 90% validation accuracy and outperforms selected RGB- and skeleton-based baselines while offering favorable training and inference efficiency. This work demonstrates the viability and scalability of pose-driven ISLR and provides a solid foundation for extending to continuous sign language recognition and multimodal architectures.

Abstract

This work presents an approach for recognizing isolated sign language gestures using skeleton-based pose data extracted from video sequences. A Graph-GRU temporal network is proposed to model both spatial and temporal dependencies between frames, enabling accurate classification. The model is trained and evaluated on the AUTSL (Ankara university Turkish sign language) dataset, achieving high accuracy. Experimental results demonstrate the effectiveness of integrating graph-based spatial representations with temporal modeling, providing a scalable framework for sign language recognition. The results of this approach highlight the potential of pose-driven methods for sign language understanding.

Sign language recognition from skeletal data using graph and recurrent neural networks

TL;DR

The paper addresses isolated sign language recognition from skeletal data by proposing a Graph-GRU temporal network that combines graph neural networks for spatial pose modeling with gated recurrent units for temporal dynamics. The model uses a sequence-of-graphs input, stacked spatio-temporal blocks with residual connections, and a temporal attention mechanism to produce a compact representation feeding a 200-class classifier. Evaluated on the AUTSL dataset with PoseNet-derived 2D skeletons, the approach achieves about 90% validation accuracy and outperforms selected RGB- and skeleton-based baselines while offering favorable training and inference efficiency. This work demonstrates the viability and scalability of pose-driven ISLR and provides a solid foundation for extending to continuous sign language recognition and multimodal architectures.

Abstract

This work presents an approach for recognizing isolated sign language gestures using skeleton-based pose data extracted from video sequences. A Graph-GRU temporal network is proposed to model both spatial and temporal dependencies between frames, enabling accurate classification. The model is trained and evaluated on the AUTSL (Ankara university Turkish sign language) dataset, achieving high accuracy. Experimental results demonstrate the effectiveness of integrating graph-based spatial representations with temporal modeling, providing a scalable framework for sign language recognition. The results of this approach highlight the potential of pose-driven methods for sign language understanding.

Paper Structure

This paper contains 8 sections, 20 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Evolution of the train and validation loss during trainning
  • Figure 2: Evolution of the train and validation loss during trainning