Enhancing ASL Recognition with GCNs and Successive Residual Connections
Ushnish Sarkar, Archisman Chakraborti, Tapas Samanta, Sarbajit Pal, Amitabha Das
TL;DR
The paper tackles ASL recognition by modeling hand landmarks as graphs, addressing the limitations of grid-based CNNs for non-Euclidean hand geometry. It builds graphs from 21 MediaPipe hand landmarks per frame, applies translational and scale normalization (e.g., $x'_i = x_i - x_0$ and $d_{ij} = ||x_i - x_j||_2$ with a scale factor), and processes them with a 3-layer GCN that uses successive residual connections to stabilize training. The approach achieves a validation accuracy of $99.14\%$ on the ASL Alphabet dataset under 5-fold cross-validation, outperforming prior methods and establishing a new benchmark. This pipeline enables robust, potentially real-time ASL translation and can be extended to larger and more diverse sign-language datasets for improved assistive human-computer interaction.
Abstract
This study presents a novel approach for enhancing American Sign Language (ASL) recognition using Graph Convolutional Networks (GCNs) integrated with successive residual connections. The method leverages the MediaPipe framework to extract key landmarks from each hand gesture, which are then used to construct graph representations. A robust preprocessing pipeline, including translational and scale normalization techniques, ensures consistency across the dataset. The constructed graphs are fed into a GCN-based neural architecture with residual connections to improve network stability. The architecture achieves state-of-the-art results, demonstrating superior generalization capabilities with a validation accuracy of 99.14%.
