SSLR: A Semi-Supervised Learning Method for Isolated Sign Language Recognition
Hasan Algafri, Hamzah Luqman, Sarah Alyami, Issam Laradji
TL;DR
Isolated sign language recognition suffers from limited labeled data, hindering robust, signer-independent performance. The authors propose SSLR, a pose-based semi-supervised framework that uses pseudo-labeling to leverage unlabeled data within a Transformer-backed architecture (SPOTER). Experiments on the WLASL-100 dataset across various labeled-data fractions show SSLR frequently matches or exceeds fully supervised baselines, with notable gains as labeled data increases and with reduced training time when data is scarce. The work reduces labeling requirements for SLR and points to future enhancements such as uncertainty-aware pseudo-labeling and class-balanced strategies to further improve performance in low-resource settings.
Abstract
Sign language is the primary communication language for people with disabling hearing loss. Sign language recognition (SLR) systems aim to recognize sign gestures and translate them into spoken language. One of the main challenges in SLR is the scarcity of annotated datasets. To address this issue, we propose a semi-supervised learning (SSL) approach for SLR (SSLR), employing a pseudo-label method to annotate unlabeled samples. The sign gestures are represented using pose information that encodes the signer's skeletal joint points. This information is used as input for the Transformer backbone model utilized in the proposed approach. To demonstrate the learning capabilities of SSL across various labeled data sizes, several experiments were conducted using different percentages of labeled data with varying numbers of classes. The performance of the SSL approach was compared with a fully supervised learning-based model on the WLASL-100 dataset. The obtained results of the SSL model outperformed the supervised learning-based model with less labeled data in many cases.
