SpellRing: Recognizing Continuous Fingerspelling in American Sign Language using a Ring
Hyunchul Lim, Nam Anh Dang, Dylan Lee, Tianhong Catherine Yu, Jane Lu, Franklin Mingzhe Li, Yiqi Jin, Yan Ma, Xiaojun Bi, François Guimbretière, Cheng Zhang
TL;DR
SpellRing tackles continuous ASL fingerspelling recognition with a minimally obtrusive wearables approach by mounting acoustic sensing and an IMU on a thumb ring. It implements a multimodal CNN-CTC pipeline with data augmentation and lexical correction to recognize words without letter-level labeling, and uses a language model to improve phrase-level decoding. In two user studies with 20 signers, SpellRing achieves 82.45% top-1 offline accuracy and 0.099 WER for real-time phrases on the MacKenzie-Soukoreff Phrase Set, with ASL learners outperforming fluent signers due to signing speed differences; pre-training across participants further enhances performance. The work demonstrates the viability of ring-based wearables for practical ASL text entry and provides design guidelines for scaling vocabulary, improving user-independence, and integrating with language models for naturalistic communication.
Abstract
Fingerspelling is a critical part of American Sign Language (ASL) recognition and has become an accessible optional text entry method for Deaf and Hard of Hearing (DHH) individuals. In this paper, we introduce SpellRing, a single smart ring worn on the thumb that recognizes words continuously fingerspelled in ASL. SpellRing uses active acoustic sensing (via a microphone and speaker) and an inertial measurement unit (IMU) to track handshape and movement, which are processed through a deep learning algorithm using Connectionist Temporal Classification (CTC) loss. We evaluated the system with 20 ASL signers (13 fluent and 7 learners), using the MacKenzie-Soukoref Phrase Set of 1,164 words and 100 phrases. Offline evaluation yielded top-1 and top-5 word recognition accuracies of 82.45% (9.67%) and 92.42% (5.70%), respectively. In real-time, the system achieved a word error rate (WER) of 0.099 (0.039) on the phrases. Based on these results, we discuss key lessons and design implications for future minimally obtrusive ASL recognition wearables.
