Representing Signs as Signs: One-Shot ISLR to Facilitate Functional Sign Language Technologies
Toon Vandendriessche, Mathieu De Coster, Annelies Lejon, Joni Dambre
TL;DR
The paper tackles the scalability challenge of ISLR across languages and evolving vocabularies by learning language-independent sign embeddings through pretraining and performing one-shot recognition via dense vector search. Using PoseFormer with keypoint inputs, it achieves state-of-the-art performance on ASL_Citizen and demonstrates strong cross-language generalization to large dictionaries (e.g., 10,235 signs) with a one-shot MRR of $0.508$. These findings show that sign representations, rather than translations, enable robust, scalable ISLR, even as vocabularies grow and signing contexts vary. The work was co-created with the Deaf and Hard of Hearing community and culminated in a publicly available dictionary lookup tool, highlighting practical impact for DHH users.
Abstract
Isolated Sign Language Recognition (ISLR) is crucial for scalable sign language technology, yet language-specific approaches limit current models. To address this, we propose a one-shot learning approach that generalises across languages and evolving vocabularies. Our method involves pretraining a model to embed signs based on essential features and using a dense vector search for rapid, accurate recognition of unseen signs. We achieve state-of-the-art results, including 50.8% one-shot MRR on a large dictionary containing 10,235 unique signs from a different language than the training set. Our approach is robust across languages and support sets, offering a scalable, adaptable solution for ISLR. Co-created with the Deaf and Hard of Hearing (DHH) community, this method aligns with real-world needs, and advances scalable sign language recognition.
