Table of Contents
Fetching ...

Optimising ChatGPT for creativity in literary translation: A case study from English into Dutch, Chinese, Catalan and Spanish

Shuxiang Du, Ana Guerberof Arenas, Antonio Toral, Kyo Gerrits, Josep Marco Borillo

TL;DR

This study investigates how ChatGPT can be optimised for creativity in literary translation across four languages (Dutch, Chinese, Catalan, Spanish) by systematically varying text granularity, temperature, and prompting strategies, and comparing against DeepL and human references. Using Kurt Vonnegut's 2BR02B as the source, the authors annotate 54 UCPs across 54 sentences and compute a creativity index that combines novelty with acceptability via the formula $CI = \\left( \\frac{\\#CSs}{\\#UCPs} - \\frac{\\#error points}{\\#words in ST} \\ ight) \\times 100$. Results show substantial cross-language variability, with a general best practice of prompting ChatGPT to output translations creatively at temperature 1.0 for ES, NL, and ZH, though CA exhibits special sensitivity to sobriquets and sometimes favors document-level prompts. Across automatic metrics and human judgments, ChatGPT typically lags behind human translations, though it can outperform certain NMT baselines under specific prompting schemes. The findings highlight both the potential and current limits of AI-assisted literary translation, informing future work on prompt design, evaluation of creativity, and long-context translation quality.

Abstract

This study examines the variability of Chat-GPT machine translation (MT) outputs across six different configurations in four languages,with a focus on creativity in a literary text. We evaluate GPT translations in different text granularity levels, temperature settings and prompting strategies with a Creativity Score formula. We found that prompting ChatGPT with a minimal instruction yields the best creative translations, with "Translate the following text into [TG] creatively" at the temperature of 1.0 outperforming other configurations and DeepL in Spanish, Dutch, and Chinese. Nonetheless, ChatGPT consistently underperforms compared to human translation (HT).

Optimising ChatGPT for creativity in literary translation: A case study from English into Dutch, Chinese, Catalan and Spanish

TL;DR

This study investigates how ChatGPT can be optimised for creativity in literary translation across four languages (Dutch, Chinese, Catalan, Spanish) by systematically varying text granularity, temperature, and prompting strategies, and comparing against DeepL and human references. Using Kurt Vonnegut's 2BR02B as the source, the authors annotate 54 UCPs across 54 sentences and compute a creativity index that combines novelty with acceptability via the formula . Results show substantial cross-language variability, with a general best practice of prompting ChatGPT to output translations creatively at temperature 1.0 for ES, NL, and ZH, though CA exhibits special sensitivity to sobriquets and sometimes favors document-level prompts. Across automatic metrics and human judgments, ChatGPT typically lags behind human translations, though it can outperform certain NMT baselines under specific prompting schemes. The findings highlight both the potential and current limits of AI-assisted literary translation, informing future work on prompt design, evaluation of creativity, and long-context translation quality.

Abstract

This study examines the variability of Chat-GPT machine translation (MT) outputs across six different configurations in four languages,with a focus on creativity in a literary text. We evaluate GPT translations in different text granularity levels, temperature settings and prompting strategies with a Creativity Score formula. We found that prompting ChatGPT with a minimal instruction yields the best creative translations, with "Translate the following text into [TG] creatively" at the temperature of 1.0 outperforming other configurations and DeepL in Spanish, Dutch, and Chinese. Nonetheless, ChatGPT consistently underperforms compared to human translation (HT).

Paper Structure

This paper contains 22 sections, 1 equation, 8 figures, 10 tables.

Figures (8)

  • Figure 1: Workflow for ZH and NL
  • Figure 2: Total CSs per Modality and Language
  • Figure 3: Total Error points per Modality and Language
  • Figure 4: Total CSs per best ChatGPT Modality and HT
  • Figure 5: Total Error points per best ChatGPT Modality and HT
  • ...and 3 more figures