Active Learning for Multilingual Fingerspelling Corpora
Shuai Wang, Eric Nalisnick
TL;DR
The paper tackles data scarcity in sign-language recognition by coupling active learning with cross-language transfer on multilingual fingerspelling corpora. It demonstrates that active learning with variation ratios consistently yields gains over random sampling, with near full-data performance reachable from a small data fraction for most datasets. Transfer active learning shows more nuanced benefits: pre-training on related fingerspelling data can help, particularly when visual similarity aligns (e.g., GSL to ISL), but benefits are not uniformly guaranteed and may be driven by visual rather than linguistic factors. These results underscore the importance of disentangling visual and linguistic similarities in transfer learning for sign-language tasks and point to future directions for more robust, data-efficient sign-language systems.
Abstract
We apply active learning to help with data scarcity problems in sign languages. In particular, we perform a novel analysis of the effect of pre-training. Since many sign languages are linguistic descendants of French sign language, they share hand configurations, which pre-training can hopefully exploit. We test this hypothesis on American, Chinese, German, and Irish fingerspelling corpora. We do observe a benefit from pre-training, but this may be due to visual rather than linguistic similarities
