Table of Contents
Fetching ...

Select and Reorder: A Novel Approach for Neural Sign Language Production

Harry Walsh, Ben Saunders, Richard Bowden

TL;DR

This work tackles sign-language translation under data scarcity by decomposing the ttg task into Gloss Selection (GS) and Gloss Reordering (GR) within a non-autoregressive framework. GS uses lexical overlap and alignment techniques (Word2Vec and BERT) to map spoken words to spo gloss tokens, while GR reorders the gloss/text into sign-order using either a statistical prereordering model or a learned transformer with a reordering mask. The end-to-end SNR pipeline combines GS output with a mapping M to yield final gloss sequences in sign order, achieving state-of-the-art BLEU and Rouge scores on mdgs and ph14t, with a notable BLEU-1 improvement of 37.88% on TTG translation for mdgs. The approach demonstrates substantial practical gains in speed and effectiveness for sign-language translation in resource-constrained settings and suggests a viable blueprint for cross-language data augmentation and multilingual modeling in SLT.

Abstract

Sign languages, often categorised as low-resource languages, face significant challenges in achieving accurate translation due to the scarcity of parallel annotated datasets. This paper introduces Select and Reorder (S&R), a novel approach that addresses data scarcity by breaking down the translation process into two distinct steps: Gloss Selection (GS) and Gloss Reordering (GR). Our method leverages large spoken language models and the substantial lexical overlap between source spoken languages and target sign languages to establish an initial alignment. Both steps make use of Non-AutoRegressive (NAR) decoding for reduced computation and faster inference speeds. Through this disentanglement of tasks, we achieve state-of-the-art BLEU and Rouge scores on the Meine DGS Annotated (mDGS) dataset, demonstrating a substantial BLUE-1 improvement of 37.88% in Text to Gloss (T2G) Translation. This innovative approach paves the way for more effective translation models for sign languages, even in resource-constrained settings.

Select and Reorder: A Novel Approach for Neural Sign Language Production

TL;DR

This work tackles sign-language translation under data scarcity by decomposing the ttg task into Gloss Selection (GS) and Gloss Reordering (GR) within a non-autoregressive framework. GS uses lexical overlap and alignment techniques (Word2Vec and BERT) to map spoken words to spo gloss tokens, while GR reorders the gloss/text into sign-order using either a statistical prereordering model or a learned transformer with a reordering mask. The end-to-end SNR pipeline combines GS output with a mapping M to yield final gloss sequences in sign order, achieving state-of-the-art BLEU and Rouge scores on mdgs and ph14t, with a notable BLEU-1 improvement of 37.88% on TTG translation for mdgs. The approach demonstrates substantial practical gains in speed and effectiveness for sign-language translation in resource-constrained settings and suggests a viable blueprint for cross-language data augmentation and multilingual modeling in SLT.

Abstract

Sign languages, often categorised as low-resource languages, face significant challenges in achieving accurate translation due to the scarcity of parallel annotated datasets. This paper introduces Select and Reorder (S&R), a novel approach that addresses data scarcity by breaking down the translation process into two distinct steps: Gloss Selection (GS) and Gloss Reordering (GR). Our method leverages large spoken language models and the substantial lexical overlap between source spoken languages and target sign languages to establish an initial alignment. Both steps make use of Non-AutoRegressive (NAR) decoding for reduced computation and faster inference speeds. Through this disentanglement of tasks, we achieve state-of-the-art BLEU and Rouge scores on the Meine DGS Annotated (mDGS) dataset, demonstrating a substantial BLUE-1 improvement of 37.88% in Text to Gloss (T2G) Translation. This innovative approach paves the way for more effective translation models for sign languages, even in resource-constrained settings.
Paper Structure (22 sections, 11 equations, 5 figures, 7 tables)

This paper contains 22 sections, 11 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: An example of gs and gr being applied to a sentence from the ph14t dataset.
  • Figure 2: A overview of the snr approach
  • Figure 3: An example of the alignment found using BERT embeddings to connect the spoken language to the glosses on the ph14t dataset. (SRC: "and now the weather forecast for tomorrow thursday the twelfth of august", TRG: "now weather tomorrow thursday twelve february")
  • Figure 4: An example of the alignment found using BERT embeddings to connect the spoken language to the glosses on the mdgs dataset. (SCR: "when you keep in touch you automatically become healthy and happy", TRG: "contact care automatic body healthy glad")
  • Figure 5: Example mdgs translations from a baseline transformer, the gs and snr models )