Table of Contents
Fetching ...

SignSpeak: Open-Source Time Series Classification for ASL Translation

Aditya Makkar, Divya Makkar, Aarav Patel, Liam Hebert

TL;DR

SignSpeak addresses the barrier to ASL fluency by replacing vision-based systems with a low-cost, five-flex-sensor glove and an open 7200-sample dataset covering 36 classes ($A$–$Z$, $1$–$10$) collected at 36 Hz. The approach benchmarks time-series classifiers (LSTM, GRU, Transformer) on this data, with the stacked GRU achieving the best accuracy near $0.922$ and Transformers showing limited gains given the data channels. The work emphasizes open-source availability of data, models, and hardware, and discusses generalization gaps relative to private datasets and specific misclassifications (e.g., confusions between $E$ and $L$). Overall, SignSpeak provides a practical, cost-effective foundation for real-time ASL translation research on embedded devices and sets a reproducible baseline for future work.

Abstract

The lack of fluency in sign language remains a barrier to seamless communication for hearing and speech-impaired communities. In this work, we propose a low-cost, real-time ASL-to-speech translation glove and an exhaustive training dataset of sign language patterns. We then benchmarked this dataset with supervised learning models, such as LSTMs, GRUs and Transformers, where our best model achieved 92% accuracy. The SignSpeak dataset has 7200 samples encompassing 36 classes (A-Z, 1-10) and aims to capture realistic signing patterns by using five low-cost flex sensors to measure finger positions at each time step at 36 Hz. Our open-source dataset, models and glove designs, provide an accurate and efficient ASL translator while maintaining cost-effectiveness, establishing a framework for future work to build on.

SignSpeak: Open-Source Time Series Classification for ASL Translation

TL;DR

SignSpeak addresses the barrier to ASL fluency by replacing vision-based systems with a low-cost, five-flex-sensor glove and an open 7200-sample dataset covering 36 classes (, ) collected at 36 Hz. The approach benchmarks time-series classifiers (LSTM, GRU, Transformer) on this data, with the stacked GRU achieving the best accuracy near and Transformers showing limited gains given the data channels. The work emphasizes open-source availability of data, models, and hardware, and discusses generalization gaps relative to private datasets and specific misclassifications (e.g., confusions between and ). Overall, SignSpeak provides a practical, cost-effective foundation for real-time ASL translation research on embedded devices and sets a reproducible baseline for future work.

Abstract

The lack of fluency in sign language remains a barrier to seamless communication for hearing and speech-impaired communities. In this work, we propose a low-cost, real-time ASL-to-speech translation glove and an exhaustive training dataset of sign language patterns. We then benchmarked this dataset with supervised learning models, such as LSTMs, GRUs and Transformers, where our best model achieved 92% accuracy. The SignSpeak dataset has 7200 samples encompassing 36 classes (A-Z, 1-10) and aims to capture realistic signing patterns by using five low-cost flex sensors to measure finger positions at each time step at 36 Hz. Our open-source dataset, models and glove designs, provide an accurate and efficient ASL translator while maintaining cost-effectiveness, establishing a framework for future work to build on.
Paper Structure (8 sections, 7 equations, 2 figures, 1 table)

This paper contains 8 sections, 7 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Circuit of data collection glove.
  • Figure 2: Confusion matrix for the Encoder classes of 'A' to 'M' to highlight the poorest performance.