Table of Contents
Fetching ...

SpellRing: Recognizing Continuous Fingerspelling in American Sign Language using a Ring

Hyunchul Lim, Nam Anh Dang, Dylan Lee, Tianhong Catherine Yu, Jane Lu, Franklin Mingzhe Li, Yiqi Jin, Yan Ma, Xiaojun Bi, François Guimbretière, Cheng Zhang

TL;DR

SpellRing tackles continuous ASL fingerspelling recognition with a minimally obtrusive wearables approach by mounting acoustic sensing and an IMU on a thumb ring. It implements a multimodal CNN-CTC pipeline with data augmentation and lexical correction to recognize words without letter-level labeling, and uses a language model to improve phrase-level decoding. In two user studies with 20 signers, SpellRing achieves 82.45% top-1 offline accuracy and 0.099 WER for real-time phrases on the MacKenzie-Soukoreff Phrase Set, with ASL learners outperforming fluent signers due to signing speed differences; pre-training across participants further enhances performance. The work demonstrates the viability of ring-based wearables for practical ASL text entry and provides design guidelines for scaling vocabulary, improving user-independence, and integrating with language models for naturalistic communication.

Abstract

Fingerspelling is a critical part of American Sign Language (ASL) recognition and has become an accessible optional text entry method for Deaf and Hard of Hearing (DHH) individuals. In this paper, we introduce SpellRing, a single smart ring worn on the thumb that recognizes words continuously fingerspelled in ASL. SpellRing uses active acoustic sensing (via a microphone and speaker) and an inertial measurement unit (IMU) to track handshape and movement, which are processed through a deep learning algorithm using Connectionist Temporal Classification (CTC) loss. We evaluated the system with 20 ASL signers (13 fluent and 7 learners), using the MacKenzie-Soukoref Phrase Set of 1,164 words and 100 phrases. Offline evaluation yielded top-1 and top-5 word recognition accuracies of 82.45% (9.67%) and 92.42% (5.70%), respectively. In real-time, the system achieved a word error rate (WER) of 0.099 (0.039) on the phrases. Based on these results, we discuss key lessons and design implications for future minimally obtrusive ASL recognition wearables.

SpellRing: Recognizing Continuous Fingerspelling in American Sign Language using a Ring

TL;DR

SpellRing tackles continuous ASL fingerspelling recognition with a minimally obtrusive wearables approach by mounting acoustic sensing and an IMU on a thumb ring. It implements a multimodal CNN-CTC pipeline with data augmentation and lexical correction to recognize words without letter-level labeling, and uses a language model to improve phrase-level decoding. In two user studies with 20 signers, SpellRing achieves 82.45% top-1 offline accuracy and 0.099 WER for real-time phrases on the MacKenzie-Soukoreff Phrase Set, with ASL learners outperforming fluent signers due to signing speed differences; pre-training across participants further enhances performance. The work demonstrates the viability of ring-based wearables for practical ASL text entry and provides design guidelines for scaling vocabulary, improving user-independence, and integrating with language models for naturalistic communication.

Abstract

Fingerspelling is a critical part of American Sign Language (ASL) recognition and has become an accessible optional text entry method for Deaf and Hard of Hearing (DHH) individuals. In this paper, we introduce SpellRing, a single smart ring worn on the thumb that recognizes words continuously fingerspelled in ASL. SpellRing uses active acoustic sensing (via a microphone and speaker) and an inertial measurement unit (IMU) to track handshape and movement, which are processed through a deep learning algorithm using Connectionist Temporal Classification (CTC) loss. We evaluated the system with 20 ASL signers (13 fluent and 7 learners), using the MacKenzie-Soukoref Phrase Set of 1,164 words and 100 phrases. Offline evaluation yielded top-1 and top-5 word recognition accuracies of 82.45% (9.67%) and 92.42% (5.70%), respectively. In real-time, the system achieved a word error rate (WER) of 0.099 (0.039) on the phrases. Based on these results, we discuss key lessons and design implications for future minimally obtrusive ASL recognition wearables.

Paper Structure

This paper contains 57 sections, 2 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Handshape variation resulting from coarticulation of adjacent fingerspelled letters: Note that the final 'E' appears differently from other occurrences of the same letter, and 'I' is coarticulated with 'L' (in red).
  • Figure 2: Acoustic and IMU data over 26 isolated English/ASL alphabet letters and continuously fingerspelled words. Continuous fingerspelling adds complexity due to natural flow and quick transitions between letters, which alter sensor values depending on adjacent letters.
  • Figure 3: Fusion Model Framework
  • Figure 4: Hardware Prototype: (a) a 3.7V 70mAh LiPo battery, (b) an nRF MCU, (c) a customized Flexible Printed Circuit Board (FPCB) with a microphone and speaker, (d) an IMU sensor board (MPU6050), (e) an ESP32 Feather Board, and (f) a 3D-printed ring case.
  • Figure 5: Study Procedures: an English word with guide fingerspelling images (left) and examples of experimental setup locations (e.g., in a study room (center); in a home (right))
  • ...and 4 more figures