Table of Contents
Fetching ...

Mitigating Translationese in Low-resource Languages: The Storyboard Approach

Garry Kuwanto, Eno-Abasi E. Urua, Priscilla Amondi Amuok, Shamsuddeen Hassan Muhammad, Anuoluwapo Aremu, Verrah Otiende, Loice Emma Nanyanga, Teresiah W. Nyoike, Aniefon D. Akpan, Nsima Ab Udouboh, Idongesit Udeme Archibong, Idara Effiong Moses, Ifeoluwatayo A. Ige, Benjamin Ajibade, Olumide Benjamin Awokoya, Idris Abdulmumin, Saminu Mohammad Aliyu, Ruqayya Nasir Iro, Ibrahim Said Ahmad, Deontae Smith, Praise-EL Michaels, David Ifeoluwa Adelani, Derry Tanti Wijaya, Anietie Andy

TL;DR

This study addresses translationese in data collection for low-resource languages by introducing a storyboard-based visual elicitation method that allows native speakers to describe scenes without exposure to source text. Compared with traditional text translation, the storyboard approach yields translations that are more fluent and lexically diverse, though slightly lower in stated accuracy as judged by human evaluators. Using both human judgments and automated metrics (LASER embeddings, MTLD, POS perplexity), the work demonstrates a trade-off: higher fluency and lexical variety with storyboard-derived translations, versus stronger semantic fidelity with text translations. The contribution includes data from four African languages including Ibibio, and points toward practical paths to mitigate translationese in language resources, with future work on automating storyboard generation via AI and extending the approach to more languages and domains.

Abstract

Low-resource languages often face challenges in acquiring high-quality language data due to the reliance on translation-based methods, which can introduce the translationese effect. This phenomenon results in translated sentences that lack fluency and naturalness in the target language. In this paper, we propose a novel approach for data collection by leveraging storyboards to elicit more fluent and natural sentences. Our method involves presenting native speakers with visual stimuli in the form of storyboards and collecting their descriptions without direct exposure to the source text. We conducted a comprehensive evaluation comparing our storyboard-based approach with traditional text translation-based methods in terms of accuracy and fluency. Human annotators and quantitative metrics were used to assess translation quality. The results indicate a preference for text translation in terms of accuracy, while our method demonstrates worse accuracy but better fluency in the language focused.

Mitigating Translationese in Low-resource Languages: The Storyboard Approach

TL;DR

This study addresses translationese in data collection for low-resource languages by introducing a storyboard-based visual elicitation method that allows native speakers to describe scenes without exposure to source text. Compared with traditional text translation, the storyboard approach yields translations that are more fluent and lexically diverse, though slightly lower in stated accuracy as judged by human evaluators. Using both human judgments and automated metrics (LASER embeddings, MTLD, POS perplexity), the work demonstrates a trade-off: higher fluency and lexical variety with storyboard-derived translations, versus stronger semantic fidelity with text translations. The contribution includes data from four African languages including Ibibio, and points toward practical paths to mitigate translationese in language resources, with future work on automating storyboard generation via AI and extending the approach to more languages and domains.

Abstract

Low-resource languages often face challenges in acquiring high-quality language data due to the reliance on translation-based methods, which can introduce the translationese effect. This phenomenon results in translated sentences that lack fluency and naturalness in the target language. In this paper, we propose a novel approach for data collection by leveraging storyboards to elicit more fluent and natural sentences. Our method involves presenting native speakers with visual stimuli in the form of storyboards and collecting their descriptions without direct exposure to the source text. We conducted a comprehensive evaluation comparing our storyboard-based approach with traditional text translation-based methods in terms of accuracy and fluency. Human annotators and quantitative metrics were used to assess translation quality. The results indicate a preference for text translation in terms of accuracy, while our method demonstrates worse accuracy but better fluency in the language focused.
Paper Structure (27 sections, 1 equation, 2 figures, 9 tables)

This paper contains 27 sections, 1 equation, 2 figures, 9 tables.

Figures (2)

  • Figure 1: Example of English sentence and Image pair
  • Figure 2: Comparison between the DALLE-3 generated storyboard (left) and the manually designed storyboard (right). The similarities highlight the potential of generative AI in automating the storyboard creation process