Table of Contents
Fetching ...

Sign Language Recognition using Bidirectional Reservoir Computing

Nitin Kumar Singh, Arie Rachmad Syulistyo, Yuichiro Tanaka, Hakaru Tamukoh

TL;DR

This paper targets the efficiency limitations of deep-learning-based sign language recognition (SLR) by introducing a lightweight pipeline that combines MediaPipe landmark extraction with a bidirectional Echo State Network (BRC) reservoir computing framework. By training only the output layer through ridge regression, the approach captures both forward and reverse temporal dynamics, achieving 57.71% accuracy on the WLASL 100 dataset with a training time of 9 seconds. Compared against a Bi-GRU baseline, the BRC method offers substantially faster training while maintaining competitive accuracy, making it well-suited for edge devices. Overall, the work demonstrates that reservoir-computing-based SLR can deliver real-time performance with low-resource requirements while remaining robust to signer variability.

Abstract

Sign language recognition (SLR) facilitates communication between deaf and hearing individuals. Deep learning is widely used to develop SLR-based systems; however, it is computationally intensive and requires substantial computational resources, making it unsuitable for resource-constrained devices. To address this, we propose an efficient sign language recognition system using MediaPipe and an echo state network (ESN)-based bidirectional reservoir computing (BRC) architecture. MediaPipe extracts hand joint coordinates, which serve as inputs to the ESN-based BRC architecture. The BRC processes these features in both forward and backward directions, efficiently capturing temporal dependencies. The resulting states of BRC are concatenated to form a robust representation for classification. We evaluated our method on the Word-Level American Sign Language (WLASL) video dataset, achieving a competitive accuracy of 57.71% and a significantly lower training time of only 9 seconds, in contrast to the 55 minutes and $38$ seconds required by the deep learning-based Bi-GRU approach. Consequently, the BRC-based SLR system is well-suited for edge devices.

Sign Language Recognition using Bidirectional Reservoir Computing

TL;DR

This paper targets the efficiency limitations of deep-learning-based sign language recognition (SLR) by introducing a lightweight pipeline that combines MediaPipe landmark extraction with a bidirectional Echo State Network (BRC) reservoir computing framework. By training only the output layer through ridge regression, the approach captures both forward and reverse temporal dynamics, achieving 57.71% accuracy on the WLASL 100 dataset with a training time of 9 seconds. Compared against a Bi-GRU baseline, the BRC method offers substantially faster training while maintaining competitive accuracy, making it well-suited for edge devices. Overall, the work demonstrates that reservoir-computing-based SLR can deliver real-time performance with low-resource requirements while remaining robust to signer variability.

Abstract

Sign language recognition (SLR) facilitates communication between deaf and hearing individuals. Deep learning is widely used to develop SLR-based systems; however, it is computationally intensive and requires substantial computational resources, making it unsuitable for resource-constrained devices. To address this, we propose an efficient sign language recognition system using MediaPipe and an echo state network (ESN)-based bidirectional reservoir computing (BRC) architecture. MediaPipe extracts hand joint coordinates, which serve as inputs to the ESN-based BRC architecture. The BRC processes these features in both forward and backward directions, efficiently capturing temporal dependencies. The resulting states of BRC are concatenated to form a robust representation for classification. We evaluated our method on the Word-Level American Sign Language (WLASL) video dataset, achieving a competitive accuracy of 57.71% and a significantly lower training time of only 9 seconds, in contrast to the 55 minutes and seconds required by the deep learning-based Bi-GRU approach. Consequently, the BRC-based SLR system is well-suited for edge devices.

Paper Structure

This paper contains 9 sections, 3 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Signs used by different signers for activities like painting, studying, and reading.
  • Figure 2: Feature extraction using MediaPipe
  • Figure 3: SLR system using Bidirectional reservoir computing