Table of Contents
Fetching ...

GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators

Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Ruizhe Li, Dong Zhang, Zhehuai Chen, Eng Siong Chng

TL;DR

GenTranslate introduces a generative, LLM-driven paradigm to fuse information from N-best translation hypotheses produced by a foundation model, addressing the information loss of traditional top-1 decoding. By coupling SeamlessM4T with efficient LLM finetuning (LLaMA-Adapter) and a dedicated HypoTranslate dataset, the approach enables autoregressive generation of high-quality translations from diverse candidates. Across multiple multilingual ST and MT benchmarks (FLEURS, CoVoST-2, MuST-C, FLORES, WMT), GenTranslate consistently outperforms state-of-the-art baselines, achieving notable BLEU gains and SOTA results in several directions. The work offers a practical path for leveraging LLMs in translation pipelines and provides resources for finetuning, marking a step forward in multilingual generative translation.

Abstract

Recent advances in large language models (LLMs) have stepped forward the development of multilingual speech and machine translation by its reduced representation errors and incorporated external knowledge. However, both translation tasks typically utilize beam search decoding and top-1 hypothesis selection for inference. These techniques struggle to fully exploit the rich information in the diverse N-best hypotheses, making them less optimal for translation tasks that require a single, high-quality output sequence. In this paper, we propose a new generative paradigm for translation tasks, namely "GenTranslate", which builds upon LLMs to generate better results from the diverse translation versions in N-best list. Leveraging the rich linguistic knowledge and strong reasoning abilities of LLMs, our new paradigm can integrate the rich information in N-best candidates to generate a higher-quality translation result. Furthermore, to support LLM finetuning, we build and release a HypoTranslate dataset that contains over 592K hypotheses-translation pairs in 11 languages. Experiments on various speech and machine translation benchmarks (e.g., FLEURS, CoVoST-2, WMT) demonstrate that our GenTranslate significantly outperforms the state-of-the-art model.

GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators

TL;DR

GenTranslate introduces a generative, LLM-driven paradigm to fuse information from N-best translation hypotheses produced by a foundation model, addressing the information loss of traditional top-1 decoding. By coupling SeamlessM4T with efficient LLM finetuning (LLaMA-Adapter) and a dedicated HypoTranslate dataset, the approach enables autoregressive generation of high-quality translations from diverse candidates. Across multiple multilingual ST and MT benchmarks (FLEURS, CoVoST-2, MuST-C, FLORES, WMT), GenTranslate consistently outperforms state-of-the-art baselines, achieving notable BLEU gains and SOTA results in several directions. The work offers a practical path for leveraging LLMs in translation pipelines and provides resources for finetuning, marking a step forward in multilingual generative translation.

Abstract

Recent advances in large language models (LLMs) have stepped forward the development of multilingual speech and machine translation by its reduced representation errors and incorporated external knowledge. However, both translation tasks typically utilize beam search decoding and top-1 hypothesis selection for inference. These techniques struggle to fully exploit the rich information in the diverse N-best hypotheses, making them less optimal for translation tasks that require a single, high-quality output sequence. In this paper, we propose a new generative paradigm for translation tasks, namely "GenTranslate", which builds upon LLMs to generate better results from the diverse translation versions in N-best list. Leveraging the rich linguistic knowledge and strong reasoning abilities of LLMs, our new paradigm can integrate the rich information in N-best candidates to generate a higher-quality translation result. Furthermore, to support LLM finetuning, we build and release a HypoTranslate dataset that contains over 592K hypotheses-translation pairs in 11 languages. Experiments on various speech and machine translation benchmarks (e.g., FLEURS, CoVoST-2, WMT) demonstrate that our GenTranslate significantly outperforms the state-of-the-art model.
Paper Structure (41 sections, 6 equations, 6 figures, 17 tables)

This paper contains 41 sections, 6 equations, 6 figures, 17 tables.

Figures (6)

  • Figure 1: Illustration of (a) Typical seq2seq translation with beam search decoding and top-1 hypothesis selection, (b) our "GenTranslate" with LLM integration.
  • Figure 2: t-SNE visualization of the n-gram tokens (n=1,2,3) in ST 1-best hypothesis (green), 2 to $N$-best hypotheses (blue), and the ground-truth translation (orange), where the text embeddings are extracted using SBERT reimers2019sentence. It indicates that the 2 to $N$-best hypotheses contain richer information than 1-best for generating ground-truth translation.
  • Figure 3: Left: Overview of the GenTranslate paradigm (e.g., De$\rightarrow$En). Right: Details of efficient LLM finetuning.
  • Figure 4: Illustration of the "ASR+GenTranslate" system for ST task as introduced in Table \ref{['table:asr_gentrans']} and §\ref{['sssec:roles']}. This system engages LLMs into the translation process by combining it with the $N$-best integration process.
  • Figure 5: t-SNE visualization of n-grams in 1-best hypothesis (green), ground-truth translation (orange) and GenTranslate output (purple). It's an extension of Fig. \ref{['fig2']}.
  • ...and 1 more figures