Table of Contents
Fetching ...

SSLR: A Semi-Supervised Learning Method for Isolated Sign Language Recognition

Hasan Algafri, Hamzah Luqman, Sarah Alyami, Issam Laradji

TL;DR

Isolated sign language recognition suffers from limited labeled data, hindering robust, signer-independent performance. The authors propose SSLR, a pose-based semi-supervised framework that uses pseudo-labeling to leverage unlabeled data within a Transformer-backed architecture (SPOTER). Experiments on the WLASL-100 dataset across various labeled-data fractions show SSLR frequently matches or exceeds fully supervised baselines, with notable gains as labeled data increases and with reduced training time when data is scarce. The work reduces labeling requirements for SLR and points to future enhancements such as uncertainty-aware pseudo-labeling and class-balanced strategies to further improve performance in low-resource settings.

Abstract

Sign language is the primary communication language for people with disabling hearing loss. Sign language recognition (SLR) systems aim to recognize sign gestures and translate them into spoken language. One of the main challenges in SLR is the scarcity of annotated datasets. To address this issue, we propose a semi-supervised learning (SSL) approach for SLR (SSLR), employing a pseudo-label method to annotate unlabeled samples. The sign gestures are represented using pose information that encodes the signer's skeletal joint points. This information is used as input for the Transformer backbone model utilized in the proposed approach. To demonstrate the learning capabilities of SSL across various labeled data sizes, several experiments were conducted using different percentages of labeled data with varying numbers of classes. The performance of the SSL approach was compared with a fully supervised learning-based model on the WLASL-100 dataset. The obtained results of the SSL model outperformed the supervised learning-based model with less labeled data in many cases.

SSLR: A Semi-Supervised Learning Method for Isolated Sign Language Recognition

TL;DR

Isolated sign language recognition suffers from limited labeled data, hindering robust, signer-independent performance. The authors propose SSLR, a pose-based semi-supervised framework that uses pseudo-labeling to leverage unlabeled data within a Transformer-backed architecture (SPOTER). Experiments on the WLASL-100 dataset across various labeled-data fractions show SSLR frequently matches or exceeds fully supervised baselines, with notable gains as labeled data increases and with reduced training time when data is scarce. The work reduces labeling requirements for SLR and points to future enhancements such as uncertainty-aware pseudo-labeling and class-balanced strategies to further improve performance in low-resource settings.

Abstract

Sign language is the primary communication language for people with disabling hearing loss. Sign language recognition (SLR) systems aim to recognize sign gestures and translate them into spoken language. One of the main challenges in SLR is the scarcity of annotated datasets. To address this issue, we propose a semi-supervised learning (SSL) approach for SLR (SSLR), employing a pseudo-label method to annotate unlabeled samples. The sign gestures are represented using pose information that encodes the signer's skeletal joint points. This information is used as input for the Transformer backbone model utilized in the proposed approach. To demonstrate the learning capabilities of SSL across various labeled data sizes, several experiments were conducted using different percentages of labeled data with varying numbers of classes. The performance of the SSL approach was compared with a fully supervised learning-based model on the WLASL-100 dataset. The obtained results of the SSL model outperformed the supervised learning-based model with less labeled data in many cases.

Paper Structure

This paper contains 9 sections, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: The pipeline of the proposed SSL model. The model is (1) trained on labeled signs using the pose-based transformer model. This model is used later to (2) predict signs of the unlabeled samples. Then, signs of unlabeled samples predicted with high confidence are (3) selected and added to the labeled set. This updated set is used to (4) re-train the model. This process is repeated until all unlabeled samples are labeled.
  • Figure 2: Architecture of the utilized Transformer model Bohacek2022.
  • Figure 3: The accuracies of the SSL model with different number of classes. The x-axis represents the percentage of labeled data used to train the model.
  • Figure 4: The performance of the SSL model on 40 classes with different percentages of labeled data. The horizontal axis represents the training cycle that involves an incremental increase of the pseudo-labeled samples.