FFN: a Fine-grained Chinese-English Financial Domain Parallel Corpus
Yuxin Fu, Shijing Si, Leyi Mai, Xi-ang Li
TL;DR
The paper addresses the need for robust Chinese–English MT in finance by introducing FFN, a manually aligned parallel corpus spanning 2014–2023 with main texts and titles. It evaluates LLMs (ChatGPT, ERNIE-Bot) against DeepL, Google, and an OpenNMT baseline, and analyzes how prompts influence translation quality. Key findings show that translation software generally outperforms LLMs on BLEU, TER, and chrF, with EN-ZH directions often stronger than ZH-EN, while prompts can affect LLM outputs but do not universally resolve core issues like mispunctuation and financial terminology errors. The FFN corpus provides a high-quality benchmark for future research, facilitating targeted improvements in domain-specific prompting and training data for finance MT. The work highlights practical implications for deploying MT in finance and offers a publicly available resource to drive further advancements in this sparsely explored area.
Abstract
Large Language Models (LLMs) have stunningly advanced the field of machine translation, though their effectiveness within the financial domain remains largely underexplored. To probe this issue, we constructed a fine-grained Chinese-English parallel corpus of financial news called FFN. We acquired financial news articles spanning between January 1st, 2014, to December 31, 2023, from mainstream media websites such as CNN, FOX, and China Daily. The dataset consists of 1,013 main text and 809 titles, all of which have been manually corrected. We measured the translation quality of two LLMs -- ChatGPT and ERNIE-bot, utilizing BLEU, TER and chrF scores as the evaluation metrics. For comparison, we also trained an OpenNMT model based on our dataset. We detail problems of LLMs and provide in-depth analysis, intending to stimulate further research and solutions in this largely uncharted territory. Our research underlines the need to optimize LLMs within the specific field of financial translation to ensure accuracy and quality.
