Fingerspelling within Sign Language Translation
Garrett Tanzer
TL;DR
This work tackles the challenge of fingerspelling integration within ASL-to-English translation by introducing a dedicated evaluation protocol (FLEURS-ASL-FS) and two concrete interventions: character-level tokenization via ByT5 and cotraining with a fingerspelling recognition dataset (FSboard). The authors annotate 1749 FLEURS-ASL sentences to identify fingerspelled spans, enabling precise span-based evaluation of translation outputs. Results show that ByT5’s character-level representation yields substantial improvements in overall translation quality and in the accuracy of fingerspelled terms within translations, while cotraining with FSboard data provides mixed gains. The study advocates for adopting character-level tokenization as a standard practice in sign-language translation and provides a scalable evaluation framework to examine fingerspelling within translation more broadly.
Abstract
Fingerspelling poses challenges for sign language processing due to its high-frequency motion and use for open-vocabulary terms. While prior work has studied fingerspelling recognition, there has been little attention to evaluating how well sign language translation models understand fingerspelling in the context of entire sentences -- and improving this capability. We manually annotate instances of fingerspelling within FLEURS-ASL and use them to evaluate the effect of two simple measures to improve fingerspelling recognition within American Sign Language to English translation: 1) use a model family (ByT5) with character- rather than subword-level tokenization, and 2) mix fingerspelling recognition data into the translation training mixture. We find that 1) substantially improves understanding of fingerspelling (and therefore translation quality overall), but the effect of 2) is mixed.
