Towards an AI-Driven Video-Based American Sign Language Dictionary: Exploring Design and Usage Experience with Learners
Saad Hassan, Matyas Bohacek, Chaelin Kim, Denise Crochet
TL;DR
This work tackles the difficulty of looking up ASL signs without text queries by delivering a fully automated, privacy-preserving video-based dictionary built on state-of-the-art sign recognition. Leveraging a Transformer-based SPOTER architecture, the authors provide a feedback-rich UI that displays confidence levels and latency information, enabling learners to refine their video submissions. An observational study with 12 novice ASL learners using real tasks reveals benefits in search ease, exposure to signing variation, and comprehension, while also uncovering challenges related to recording unknown signs, output unpredictability, latency, and privacy concerns. The results yield practical design guidance for deployment, including real-time submission feedback, confidence-based result ranking, and strategies to reduce latency and background noise, with the prototype openly released for research use. Collectively, the paper advances video-based ASL dictionary design by integrating prior WoZ insights with functional AI feedback in authentic learning contexts, highlighting both educational value and non-functional considerations like bias and privacy.
Abstract
Searching for unfamiliar American Sign Language (ASL) signs is challenging for learners because, unlike spoken languages, they cannot type a text-based query to look up an unfamiliar sign. Advances in isolated sign recognition have enabled the creation of video-based dictionaries, allowing users to submit a video and receive a list of the closest matching signs. Previous HCI research using Wizard-of-Oz prototypes has explored interface designs for ASL dictionaries. Building on these studies, we incorporate their design recommendations and leverage state-of-the-art sign-recognition technology to develop an automated video-based dictionary. We also present findings from an observational study with twelve novice ASL learners who used this dictionary during video-comprehension and question-answering tasks. Our results address human-AI interaction challenges not covered in previous WoZ research, including recording and resubmitting signs, unpredictable outputs, system latency, and privacy concerns. These insights offer guidance for designing and deploying video-based ASL dictionary systems.
