Table of Contents
Fetching ...

MSP: Multi-Stage Prompting for Making Pre-trained Language Models Better Translators

Zhixing Tan, Xiangwen Zhang, Shuo Wang, Yang Liu

TL;DR

This work introduces Multi-Stage Prompting (MSP), a framework that splits translation into encoding, re-encoding, and decoding stages, each guided by stage-specific continuous prompts to better align pre-trained language models with translation tasks. By employing deep continuous prompts and a scaled reparameterization scheme, MSP enables efficient training while keeping the backbone LM fixed. Across Romanian-English, English-German, and English-Chinese directions, MSP substantially outperforms single-stage prompting methods and remains highly parameter-efficient compared to larger encoder-decoder models. The results demonstrate MSP’s potential to repurpose decoder-only LMs as competitive MT systems with favorable training costs, while analyses reveal that translation knowledge largely resides in the LM itself rather than the prompts. Overall, MSP provides a practical, scalable approach to leveraging pre-trained LMs for translation with significant gains and insights into prompt-based translation dynamics.

Abstract

Prompting has recently been shown as a promising approach for applying pre-trained language models to perform downstream tasks. We present Multi-Stage Prompting (MSP), a simple and automatic approach for leveraging pre-trained language models to translation tasks. To better mitigate the discrepancy between pre-training and translation, MSP divides the translation process via pre-trained language models into multiple separate stages: the encoding stage, the re-encoding stage, and the decoding stage. During each stage, we independently apply different continuous prompts for allowing pre-trained language models better shift to translation tasks. We conduct extensive experiments on three translation tasks. Experiments show that our method can significantly improve the translation performance of pre-trained language models.

MSP: Multi-Stage Prompting for Making Pre-trained Language Models Better Translators

TL;DR

This work introduces Multi-Stage Prompting (MSP), a framework that splits translation into encoding, re-encoding, and decoding stages, each guided by stage-specific continuous prompts to better align pre-trained language models with translation tasks. By employing deep continuous prompts and a scaled reparameterization scheme, MSP enables efficient training while keeping the backbone LM fixed. Across Romanian-English, English-German, and English-Chinese directions, MSP substantially outperforms single-stage prompting methods and remains highly parameter-efficient compared to larger encoder-decoder models. The results demonstrate MSP’s potential to repurpose decoder-only LMs as competitive MT systems with favorable training costs, while analyses reveal that translation knowledge largely resides in the LM itself rather than the prompts. Overall, MSP provides a practical, scalable approach to leveraging pre-trained LMs for translation with significant gains and insights into prompt-based translation dynamics.

Abstract

Prompting has recently been shown as a promising approach for applying pre-trained language models to perform downstream tasks. We present Multi-Stage Prompting (MSP), a simple and automatic approach for leveraging pre-trained language models to translation tasks. To better mitigate the discrepancy between pre-training and translation, MSP divides the translation process via pre-trained language models into multiple separate stages: the encoding stage, the re-encoding stage, and the decoding stage. During each stage, we independently apply different continuous prompts for allowing pre-trained language models better shift to translation tasks. We conduct extensive experiments on three translation tasks. Experiments show that our method can significantly improve the translation performance of pre-trained language models.

Paper Structure

This paper contains 36 sections, 8 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Overview of using prompts for steering a multilingual GPT (mGPT) model to machine translation tasks. Note that we reset the position ids during each stage in multi-stage prompting for ease of implementation. All stages use the same mGPT model.
  • Figure 2: A deep continuous prompt is prepended to the inputs in all attention layers, which affects the computation of all attention layers. We do not distinguish keys and values here for simplicity.
  • Figure 3: Detailed computations involved in the multi-stage prompting for machine translation tasks. We use rectangles to denote prompt vectors and rounded rectangles to denote activations.
  • Figure 4: Comparison between MSP and prefix-tuning on the WMT14 En-De translation task with different prompt lengths.
  • Figure 5: Comparison between using scaled reparameterization and without using reparameterization on the WMT14 translation task. The BLEU score is evaluated on newstest2013.