Table of Contents
Fetching ...

EBBS: An Ensemble with Bi-Level Beam Search for Zero-Shot Machine Translation

Yuqiao Wen, Behzad Shayegh, Chenyang Huang, Yanshuai Cao, Lili Mou

TL;DR

The paper addresses zero-shot multilingual machine translation by introducing EBBS, an ensemble decoding framework that uses bi-level beam search to let each translation path explore its own predictions while a soft-voting mechanism synchronizes results at each generation step. By combining direct and pivot translations, EBBS improves zero-shot quality over traditional direct or pivot approaches and surpasses other ensemble methods on IWSLT and Europarl. Additionally, EBBS-based distillation leverages high-quality ensemble outputs to train a single, efficient model without increasing inference cost, sometimes even boosting translation quality. The approach offers a practical path to stronger zero-shot MT with scalable inference, validated by comprehensive experiments and analyses.

Abstract

The ability of zero-shot translation emerges when we train a multilingual model with certain translation directions; the model can then directly translate in unseen directions. Alternatively, zero-shot translation can be accomplished by pivoting through a third language (e.g., English). In our work, we observe that both direct and pivot translations are noisy and achieve less satisfactory performance. We propose EBBS, an ensemble method with a novel bi-level beam search algorithm, where each ensemble component explores its own prediction step by step at the lower level but they are synchronized by a "soft voting" mechanism at the upper level. Results on two popular multilingual translation datasets show that EBBS consistently outperforms direct and pivot translations as well as existing ensemble techniques. Further, we can distill the ensemble's knowledge back to the multilingual model to improve inference efficiency; profoundly, our EBBS-based distillation does not sacrifice, or even improves, the translation quality.

EBBS: An Ensemble with Bi-Level Beam Search for Zero-Shot Machine Translation

TL;DR

The paper addresses zero-shot multilingual machine translation by introducing EBBS, an ensemble decoding framework that uses bi-level beam search to let each translation path explore its own predictions while a soft-voting mechanism synchronizes results at each generation step. By combining direct and pivot translations, EBBS improves zero-shot quality over traditional direct or pivot approaches and surpasses other ensemble methods on IWSLT and Europarl. Additionally, EBBS-based distillation leverages high-quality ensemble outputs to train a single, efficient model without increasing inference cost, sometimes even boosting translation quality. The approach offers a practical path to stronger zero-shot MT with scalable inference, validated by comprehensive experiments and analyses.

Abstract

The ability of zero-shot translation emerges when we train a multilingual model with certain translation directions; the model can then directly translate in unseen directions. Alternatively, zero-shot translation can be accomplished by pivoting through a third language (e.g., English). In our work, we observe that both direct and pivot translations are noisy and achieve less satisfactory performance. We propose EBBS, an ensemble method with a novel bi-level beam search algorithm, where each ensemble component explores its own prediction step by step at the lower level but they are synchronized by a "soft voting" mechanism at the upper level. Results on two popular multilingual translation datasets show that EBBS consistently outperforms direct and pivot translations as well as existing ensemble techniques. Further, we can distill the ensemble's knowledge back to the multilingual model to improve inference efficiency; profoundly, our EBBS-based distillation does not sacrifice, or even improves, the translation quality.
Paper Structure (19 sections, 6 equations, 3 figures, 10 tables, 2 algorithms)

This paper contains 19 sections, 6 equations, 3 figures, 10 tables, 2 algorithms.

Figures (3)

  • Figure 1: Illustration of our EBBS algorithm.
  • Figure 2: Analysis of the number of ensemble components for Italian-to-Dutch translation on Europarl.
  • Figure 3: Inference time analysis on the test set of Italian-to-Dutch translation from Europarl. Experiments were conducted on an AMD EPYC 7313 CPU and an NVIDIA RTX A6000 GPU, with a batch size of 300 samples.