ChatGPT as a Translation Engine: A Case Study on Japanese-English
Vincent Michael Sutanto, Giovanni Gatti De Giacomo, Toshiaki Nakazawa, Masaru Yamada
TL;DR
This study evaluates ChatGPT-3.5 and ChatGPT-4 as JA-EN translation engines, examining document-level versus sentence-level translation and simple versus enhanced prompting, while benchmarking against two commercial MT systems. It leverages multiple JA-EN datasets (ParaNatCom, FLORES, Novels, KFTT, WMT News) and uses automatic metrics (BLEU, COMET, DA-BERT) alongside MQM-based human evaluation, with a small-sample approach due to API constraints and an open-source MQM tool release. Key findings show that document-level translation improves context preservation and overall quality, while enhanced prompts yield inconclusive gains; 3.5 tends to be more accurate, whereas 4 offers greater fluency, leading to task-dependent tradeoffs. Practically, ChatGPT demonstrates competitive JA-EN translations against commercial MT while highlighting latency considerations and the potential for domain-adapted outputs through prompting.
Abstract
This study investigates ChatGPT for Japanese-English translation, exploring simple and enhanced prompts and comparing against commercially available translation engines. Performing both automatic and MQM-based human evaluations, we found that document-level translation outperforms sentence-level translation for ChatGPT. On the other hand, we were not able to determine if enhanced prompts performed better than simple prompts in our experiments. We also discovered that ChatGPT-3.5 was preferred by automatic evaluation, but a tradeoff exists between accuracy (ChatGPT-3.5) and fluency (ChatGPT-4). Lastly, ChatGPT yields competitive results against two widely-known translation systems.
