Table of Contents
Fetching ...

Exploring the Potential Role of Generative AI in the TRAPD Procedure for Survey Translation

Erica Ann Metheney, Lauren Yehle

TL;DR

This work tackles translation errors in multilingual surveys by examining how zero-shot prompts to generative AI (GPT-3.5 and GPT-4) can flag translation-difficulty features in TRAPD-style workflows. It employs a 2x3 factorial design over 282 source questions drawn from Gallup, WVS, and LGPI, with target audiences in Castilian Spanish and Mandarin Chinese, and uses a qualitative codebook of 10 codes to categorize AI-identified issues. The findings reveal nuanced model and audience effects: GPT-4 often flags fewer total codes but increases certain codes (e.g., syntax and sensitivity), while specifying target audiences alters the likelihood of NOTA and other codes in systematic ways; question source also shifts coding rates. Practically, the study demonstrates AI's potential to augment, not replace, translation best practices, offering actionable prompts and an integration pathway into the TRAPD process, and it outlines future work on accuracy, broader languages, and few-shot prompting.

Abstract

This paper explores and assesses in what ways generative AI can assist in translating survey instruments. Writing effective survey questions is a challenging and complex task, made even more difficult for surveys that will be translated and deployed in multiple linguistic and cultural settings. Translation errors can be detrimental, with known errors rendering data unusable for its intended purpose and undetected errors leading to incorrect conclusions. A growing number of institutions face this problem as surveys deployed by private and academic organizations globalize, and the success of their current efforts depends heavily on researchers' and translators' expertise and the amount of time each party has to contribute to the task. Thus, multilinguistic and multicultural surveys produced by teams with limited expertise, budgets, or time are at significant risk for translation-based errors in their data. We implement a zero-shot prompt experiment using ChatGPT to explore generative AI's ability to identify features of questions that might be difficult to translate to a linguistic audience other than the source language. We find that ChatGPT can provide meaningful feedback on translation issues, including common source survey language, inconsistent conceptualization, sensitivity and formality issues, and nonexistent concepts. In addition, we provide detailed information on the practicality of the approach, including accessing the necessary software, associated costs, and computational run times. Lastly, based on our findings, we propose avenues for future research that integrate AI into survey translation practices.

Exploring the Potential Role of Generative AI in the TRAPD Procedure for Survey Translation

TL;DR

This work tackles translation errors in multilingual surveys by examining how zero-shot prompts to generative AI (GPT-3.5 and GPT-4) can flag translation-difficulty features in TRAPD-style workflows. It employs a 2x3 factorial design over 282 source questions drawn from Gallup, WVS, and LGPI, with target audiences in Castilian Spanish and Mandarin Chinese, and uses a qualitative codebook of 10 codes to categorize AI-identified issues. The findings reveal nuanced model and audience effects: GPT-4 often flags fewer total codes but increases certain codes (e.g., syntax and sensitivity), while specifying target audiences alters the likelihood of NOTA and other codes in systematic ways; question source also shifts coding rates. Practically, the study demonstrates AI's potential to augment, not replace, translation best practices, offering actionable prompts and an integration pathway into the TRAPD process, and it outlines future work on accuracy, broader languages, and few-shot prompting.

Abstract

This paper explores and assesses in what ways generative AI can assist in translating survey instruments. Writing effective survey questions is a challenging and complex task, made even more difficult for surveys that will be translated and deployed in multiple linguistic and cultural settings. Translation errors can be detrimental, with known errors rendering data unusable for its intended purpose and undetected errors leading to incorrect conclusions. A growing number of institutions face this problem as surveys deployed by private and academic organizations globalize, and the success of their current efforts depends heavily on researchers' and translators' expertise and the amount of time each party has to contribute to the task. Thus, multilinguistic and multicultural surveys produced by teams with limited expertise, budgets, or time are at significant risk for translation-based errors in their data. We implement a zero-shot prompt experiment using ChatGPT to explore generative AI's ability to identify features of questions that might be difficult to translate to a linguistic audience other than the source language. We find that ChatGPT can provide meaningful feedback on translation issues, including common source survey language, inconsistent conceptualization, sensitivity and formality issues, and nonexistent concepts. In addition, we provide detailed information on the practicality of the approach, including accessing the necessary software, associated costs, and computational run times. Lastly, based on our findings, we propose avenues for future research that integrate AI into survey translation practices.

Paper Structure

This paper contains 50 sections, 2 equations, 12 figures, 18 tables, 1 algorithm.

Figures (12)

  • Figure 1: Distribution of the Number of Codes at the Treatment-Question Level
  • Figure 2: 95% Confidence Intervals of the Model Effect of the Likelihood of Flagging Each Code
  • Figure 3: Predicted Probabilities from Hierarchical Logistical Regression Results - Statistically Significant Model by Target Audience Interaction Effects on the Likelihood of Flagging of Codes 5, 7, 9, and NOTA.
  • Figure 4: Average Statement Placement by Code, Sorted from Least to Greatest and Labelled with Groupings from Tukey HSD Test
  • Figure 5: Number of Codes Flagged by Each Treatment
  • ...and 7 more figures