Table of Contents
Fetching ...

SiLLM: Large Language Models for Simultaneous Machine Translation

Shoutao Guo, Shaolei Zhang, Zhengrui Ma, Min Zhang, Yang Feng

TL;DR

This work proposes SiLLM, which delegates the two sub-tasks to separate agents, thereby incorporating LLM into SiMT, and proposes a word-level policy adapted for LLM to facilitate the application of token-level policies determined by conventional SiMT models to LLM.

Abstract

Simultaneous Machine Translation (SiMT) generates translations while reading the source sentence, necessitating a policy to determine the optimal timing for reading and generating words. Despite the remarkable performance achieved by Large Language Models (LLM) across various NLP tasks, existing SiMT methods predominantly focus on conventional transformers, employing a single model to concurrently determine the policy and generate the translations. However, given the complexity of SiMT, it is challenging to effectively address both tasks with a single model. Therefore, there is a need to decouple the SiMT task into policy-decision and translation sub-tasks. We propose SiLLM, which delegates the two sub-tasks to separate agents, thereby incorporating LLM into SiMT. The policy-decision agent is managed by a conventional SiMT model, responsible for determining the translation policy. The translation agent, leveraging the capabilities of LLM, generates translation using the partial source sentence. The two agents collaborate to accomplish SiMT. To facilitate the application of token-level policies determined by conventional SiMT models to LLM, we propose a word-level policy adapted for LLM. Experiments on two datasets demonstrate that, with a small amount of data for fine-tuning LLM, SiLLM attains state-of-the-art performance.

SiLLM: Large Language Models for Simultaneous Machine Translation

TL;DR

This work proposes SiLLM, which delegates the two sub-tasks to separate agents, thereby incorporating LLM into SiMT, and proposes a word-level policy adapted for LLM to facilitate the application of token-level policies determined by conventional SiMT models to LLM.

Abstract

Simultaneous Machine Translation (SiMT) generates translations while reading the source sentence, necessitating a policy to determine the optimal timing for reading and generating words. Despite the remarkable performance achieved by Large Language Models (LLM) across various NLP tasks, existing SiMT methods predominantly focus on conventional transformers, employing a single model to concurrently determine the policy and generate the translations. However, given the complexity of SiMT, it is challenging to effectively address both tasks with a single model. Therefore, there is a need to decouple the SiMT task into policy-decision and translation sub-tasks. We propose SiLLM, which delegates the two sub-tasks to separate agents, thereby incorporating LLM into SiMT. The policy-decision agent is managed by a conventional SiMT model, responsible for determining the translation policy. The translation agent, leveraging the capabilities of LLM, generates translation using the partial source sentence. The two agents collaborate to accomplish SiMT. To facilitate the application of token-level policies determined by conventional SiMT models to LLM, we propose a word-level policy adapted for LLM. Experiments on two datasets demonstrate that, with a small amount of data for fine-tuning LLM, SiLLM attains state-of-the-art performance.
Paper Structure (25 sections, 7 equations, 7 figures, 6 tables)

This paper contains 25 sections, 7 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: The framework of SiLLM. The numbers in the diagram signify the execution sequence of SiLLM. The red lines denote operations performed when the policy-decision agent determines to read source words. The blue lines indicate operations carried out when the decision for generation is made. The black line denotes the operation shared between both decision types.
  • Figure 2: The illustration of incorporating boundary restrictions to word-level policy. The hyperparameters $B$ and $T$ in the figure are set to 1 and 3, respectively. In the absence of boundary restrictions, the word-level policy generates $y_1$ after reading $x_4$. However, our approach modifies it to generate $y_1$ upon reading $x_3$.
  • Figure 3: Performance of different SiMT methods on De$\rightarrow$En and En$\rightarrow$De tasks.
  • Figure 4: The impact of different quantities of training data during SFT on Wait-$k$-SiLLM+SFT. The experiments are conducted on the De$\rightarrow$En task
  • Figure 5: The hallucination rate (HR) of different SiMT methods. The results are based on the De$\rightarrow$En task.
  • ...and 2 more figures