Table of Contents
Fetching ...

Gradable ChatGPT Translation Evaluation

Hui Jiao, Bei Peng, Lu Zong, Xiaojun Zhang, Xinwei Li

TL;DR

This work addresses the sensitivity of ChatGPT-based translations to prompt design and introduces the T3S gradable prompting taxonomy for translation tasks. By structuring prompts along expression type, translation style, POS information, and few-shot prompts, the authors demonstrate systematic improvements in translation quality using Flores-101 Chinese–English data, with Level-4 prompts achieving BLEU scores of 42.88 and surpassing zero-shot GPT-4 baselines. The study provides concrete prompt-construction guidelines and case analyses, and suggests broader evaluation across languages and comparisons with established MT services. Overall, the T3S taxonomy offers a practical framework to enhance LLM-driven translation through principled prompt design, with clear directions for future expansion and cross-domain applications.

Abstract

ChatGPT, as a language model based on large-scale pre-training, has exerted a profound influence on the domain of machine translation. In ChatGPT, a "Prompt" refers to a segment of text or instruction employed to steer the model towards generating a specific category of response. The design of the translation prompt emerges as a key aspect that can wield influence over factors such as the style, precision and accuracy of the translation to a certain extent. However, there is a lack of a common standard and methodology on how to design and select a translation prompt. Accordingly, this paper proposes a generic taxonomy, which defines gradable translation prompts in terms of expression type, translation style, POS information and explicit statement, thus facilitating the construction of prompts endowed with distinct attributes tailored for various translation tasks. Specific experiments and cases are selected to validate and illustrate the effectiveness of the method.

Gradable ChatGPT Translation Evaluation

TL;DR

This work addresses the sensitivity of ChatGPT-based translations to prompt design and introduces the T3S gradable prompting taxonomy for translation tasks. By structuring prompts along expression type, translation style, POS information, and few-shot prompts, the authors demonstrate systematic improvements in translation quality using Flores-101 Chinese–English data, with Level-4 prompts achieving BLEU scores of 42.88 and surpassing zero-shot GPT-4 baselines. The study provides concrete prompt-construction guidelines and case analyses, and suggests broader evaluation across languages and comparisons with established MT services. Overall, the T3S taxonomy offers a practical framework to enhance LLM-driven translation through principled prompt design, with clear directions for future expansion and cross-domain applications.

Abstract

ChatGPT, as a language model based on large-scale pre-training, has exerted a profound influence on the domain of machine translation. In ChatGPT, a "Prompt" refers to a segment of text or instruction employed to steer the model towards generating a specific category of response. The design of the translation prompt emerges as a key aspect that can wield influence over factors such as the style, precision and accuracy of the translation to a certain extent. However, there is a lack of a common standard and methodology on how to design and select a translation prompt. Accordingly, this paper proposes a generic taxonomy, which defines gradable translation prompts in terms of expression type, translation style, POS information and explicit statement, thus facilitating the construction of prompts endowed with distinct attributes tailored for various translation tasks. Specific experiments and cases are selected to validate and illustrate the effectiveness of the method.
Paper Structure (15 sections, 1 equation, 1 figure, 3 tables)