Select and Reorder: A Novel Approach for Neural Sign Language Production
Harry Walsh, Ben Saunders, Richard Bowden
TL;DR
This work tackles sign-language translation under data scarcity by decomposing the ttg task into Gloss Selection (GS) and Gloss Reordering (GR) within a non-autoregressive framework. GS uses lexical overlap and alignment techniques (Word2Vec and BERT) to map spoken words to spo gloss tokens, while GR reorders the gloss/text into sign-order using either a statistical prereordering model or a learned transformer with a reordering mask. The end-to-end SNR pipeline combines GS output with a mapping M to yield final gloss sequences in sign order, achieving state-of-the-art BLEU and Rouge scores on mdgs and ph14t, with a notable BLEU-1 improvement of 37.88% on TTG translation for mdgs. The approach demonstrates substantial practical gains in speed and effectiveness for sign-language translation in resource-constrained settings and suggests a viable blueprint for cross-language data augmentation and multilingual modeling in SLT.
Abstract
Sign languages, often categorised as low-resource languages, face significant challenges in achieving accurate translation due to the scarcity of parallel annotated datasets. This paper introduces Select and Reorder (S&R), a novel approach that addresses data scarcity by breaking down the translation process into two distinct steps: Gloss Selection (GS) and Gloss Reordering (GR). Our method leverages large spoken language models and the substantial lexical overlap between source spoken languages and target sign languages to establish an initial alignment. Both steps make use of Non-AutoRegressive (NAR) decoding for reduced computation and faster inference speeds. Through this disentanglement of tasks, we achieve state-of-the-art BLEU and Rouge scores on the Meine DGS Annotated (mDGS) dataset, demonstrating a substantial BLUE-1 improvement of 37.88% in Text to Gloss (T2G) Translation. This innovative approach paves the way for more effective translation models for sign languages, even in resource-constrained settings.
