Towards Privacy-Aware Sign Language Translation at Scale
Phillip Rust, Bowen Shi, Skyler Wang, Necati Cihan Camgöz, Jean Maillard
TL;DR
Data scarcity and privacy concerns hinder scalable sign language translation. The authors propose SSVP-SLT, a privacy-aware two-stage framework that pretrains a sign-language encoder with masked autoencoding on anonymized video (SignHiera) and then finetunes on a curated parallel corpus, optionally adding language-supervised pretraining. On How2Sign, SSVP-SLT achieves state-of-the-art finetuned and zero-shot gloss-free SLT performance, surpassing prior baselines by about 3 BLEU points, and introduces DailyMoth-70h as a new single-signer benchmark. The work demonstrates that facial blurring provides privacy-preserving benefits with limited performance loss, highlights the substantial compute required for large-scale video pretraining, and discusses directions for expanding language coverage and privacy techniques in SLT.
Abstract
A major impediment to the advancement of sign language translation (SLT) is data scarcity. Much of the sign language data currently available on the web cannot be used for training supervised models due to the lack of aligned captions. Furthermore, scaling SLT using large-scale web-scraped datasets bears privacy risks due to the presence of biometric information, which the responsible development of SLT technologies should account for. In this work, we propose a two-stage framework for privacy-aware SLT at scale that addresses both of these issues. We introduce SSVP-SLT, which leverages self-supervised video pretraining on anonymized and unannotated videos, followed by supervised SLT finetuning on a curated parallel dataset. SSVP-SLT achieves state-of-the-art finetuned and zero-shot gloss-free SLT performance on the How2Sign dataset, outperforming the strongest respective baselines by over 3 BLEU-4. Based on controlled experiments, we further discuss the advantages and limitations of self-supervised pretraining and anonymization via facial obfuscation for SLT.
