Table of Contents
Fetching ...

Diverse Sign Language Translation

Xin Shen, Lei Shen, Shaozu Yuan, Heming Du, Haiyang Sun, Xin Yu

TL;DR

This work introduces a Diverse Sign Language Translation (DivSLT) task, aiming to generate diverse yet accurate translations for sign language videos, and investigates multi-reference training strategies to enable the DivSLT model to achieve diverse translations.

Abstract

Like spoken languages, a single sign language expression could correspond to multiple valid textual interpretations. Hence, learning a rigid one-to-one mapping for sign language translation (SLT) models might be inadequate, particularly in the case of limited data. In this work, we introduce a Diverse Sign Language Translation (DivSLT) task, aiming to generate diverse yet accurate translations for sign language videos. Firstly, we employ large language models (LLM) to generate multiple references for the widely-used CSL-Daily and PHOENIX14T SLT datasets. Here, native speakers are only invited to touch up inaccurate references, thus significantly improving the annotation efficiency. Secondly, we provide a benchmark model to spur research in this task. Specifically, we investigate multi-reference training strategies to enable our DivSLT model to achieve diverse translations. Then, to enhance translation accuracy, we employ the max-reward-driven reinforcement learning objective that maximizes the reward of the translated result. Additionally, we utilize multiple metrics to assess the accuracy, diversity, and semantic precision of the DivSLT task. Experimental results on the enriched datasets demonstrate that our DivSLT method achieves not only better translation performance but also diverse translation results.

Diverse Sign Language Translation

TL;DR

This work introduces a Diverse Sign Language Translation (DivSLT) task, aiming to generate diverse yet accurate translations for sign language videos, and investigates multi-reference training strategies to enable the DivSLT model to achieve diverse translations.

Abstract

Like spoken languages, a single sign language expression could correspond to multiple valid textual interpretations. Hence, learning a rigid one-to-one mapping for sign language translation (SLT) models might be inadequate, particularly in the case of limited data. In this work, we introduce a Diverse Sign Language Translation (DivSLT) task, aiming to generate diverse yet accurate translations for sign language videos. Firstly, we employ large language models (LLM) to generate multiple references for the widely-used CSL-Daily and PHOENIX14T SLT datasets. Here, native speakers are only invited to touch up inaccurate references, thus significantly improving the annotation efficiency. Secondly, we provide a benchmark model to spur research in this task. Specifically, we investigate multi-reference training strategies to enable our DivSLT model to achieve diverse translations. Then, to enhance translation accuracy, we employ the max-reward-driven reinforcement learning objective that maximizes the reward of the translated result. Additionally, we utilize multiple metrics to assess the accuracy, diversity, and semantic precision of the DivSLT task. Experimental results on the enriched datasets demonstrate that our DivSLT method achieves not only better translation performance but also diverse translation results.

Paper Structure

This paper contains 33 sections, 8 equations, 5 figures, 18 tables.

Figures (5)

  • Figure 1: Illustration of our pipeline of leveraging an LLM to generate multiple translations that closely resemble the ground-truth translation.
  • Figure 2: Human evaluation on diverse sign language translation results. Score 5 denotes that the translation results are completely faithful and exhibit diversity in expressions, and Score 0 indicates a complete lack of faithfulness or absence of diversity.
  • Figure 3: Different visual-language pre-training strategies. (a) One-to-One GFSLT-VLP GF_VLP. (b) One-to-Many DivSLT-VLP.
  • Figure 4: B-BM scores of the Top-$k$th hypothesis generated by GFSLT and DivSLT.
  • Figure 5: BRT-BM scores of the Top-$k$th hypothesis generated by GFSLT and DivSLT.