Distinguishing Translations by Human, NMT, and ChatGPT: A Linguistic and Statistical Approach

Zhaokun Jiang; Qianxi Lv; Ziyin Zhang; Lei Lei

Distinguishing Translations by Human, NMT, and ChatGPT: A Linguistic and Statistical Approach

Zhaokun Jiang, Qianxi Lv, Ziyin Zhang, Lei Lei

TL;DR

This paper investigates how translations produced by humans, neural machine translation (NMT), and ChatGPT differ linguistically and structurally within a diplomatic register. It builds a three-way framework around 147 Spokesperson's Remarks, employs rolling stylometry to create 210 samples, extracts 121 linguistic features, and applies both supervised classifiers and multidimensional analysis (MDA) to distinguish translation types, complemented by distance measures and t-SNE visualization. Key findings show that supervised models achieve near-perfect separation among the three types, unsupervised clustering fails to clearly separate them, and ChatGPT translations tend to resemble NMT more than human translations across most MDA dimensions; Euclidean distances and visualizations corroborate this proximity. The study offers insights into the interrelationships among translation types, with implications for improving NMT and AI translation systems and for determining when to rely on or combine AI tools with human expertise in diplomatic contexts.

Abstract

The growing popularity of neural machine translation (NMT) and LLMs represented by ChatGPT underscores the need for a deeper understanding of their distinct characteristics and relationships. Such understanding is crucial for language professionals and researchers to make informed decisions and tactful use of these cutting-edge translation technology, but remains underexplored. This study aims to fill this gap by investigating three key questions: (1) the distinguishability of ChatGPT-generated translations from NMT and human translation (HT), (2) the linguistic characteristics of each translation type, and (3) the degree of resemblance between ChatGPT-produced translations and HT or NMT. To achieve these objectives, we employ statistical testing, machine learning algorithms, and multidimensional analysis (MDA) to analyze Spokesperson's Remarks and their translations. After extracting a wide range of linguistic features, supervised classifiers demonstrate high accuracy in distinguishing the three translation types, whereas unsupervised clustering techniques do not yield satisfactory results. Another major finding is that ChatGPT-produced translations exhibit greater similarity with NMT than HT in most MDA dimensions, which is further corroborated by distance computing and visualization. These novel insights shed light on the interrelationships among the three translation types and have implications for the future advancements of NMT and generative AI.

Distinguishing Translations by Human, NMT, and ChatGPT: A Linguistic and Statistical Approach

TL;DR

Abstract

Paper Structure (18 sections, 10 figures, 8 tables)

This paper contains 18 sections, 10 figures, 8 tables.

Introduction
Related Work
Machine Translation and Large Language Models
Comparative Studies of NMT, ChatGPT, and HT
Multi-Feature Methods in Translation Studies
Methodology
Corpus Building and Text Processing
Feature Extraction
Distinguishability Study via Clustering and Classification
Multidimensional Analysis
Calculation of the Pairwise Euclidean Distances and Visualization with t-SNE
Results and Findings
Clustering and Classification Results
Interpreting Dimensions as a Result of Co-occurring Features
Calculating and Visualizing Distances
...and 3 more sections

Figures (10)

Figure 1: T-SNE visualization of distance distributions between translations by human, NMT, and ChatGPT.
Figure 2: An illustration of rolling stylometry adapted from Eder2016.
Figure 3: Hierarchical clustering dendrogram.
Figure 4: Correlation heatmap among the 74 statistically significant features.
Figure 5: The scree plot showing the eigenvalue of the corresponding factor number. This helps determine the suitable number of factors to retain, as it identifies the point where the eigenvalues sharply decrease, indicating diminishing returns in terms of explained variance.
...and 5 more figures

Distinguishing Translations by Human, NMT, and ChatGPT: A Linguistic and Statistical Approach

TL;DR

Abstract

Distinguishing Translations by Human, NMT, and ChatGPT: A Linguistic and Statistical Approach

Authors

TL;DR

Abstract

Table of Contents

Figures (10)