MSP: Multi-Stage Prompting for Making Pre-trained Language Models Better Translators
Zhixing Tan, Xiangwen Zhang, Shuo Wang, Yang Liu
TL;DR
This work introduces Multi-Stage Prompting (MSP), a framework that splits translation into encoding, re-encoding, and decoding stages, each guided by stage-specific continuous prompts to better align pre-trained language models with translation tasks. By employing deep continuous prompts and a scaled reparameterization scheme, MSP enables efficient training while keeping the backbone LM fixed. Across Romanian-English, English-German, and English-Chinese directions, MSP substantially outperforms single-stage prompting methods and remains highly parameter-efficient compared to larger encoder-decoder models. The results demonstrate MSP’s potential to repurpose decoder-only LMs as competitive MT systems with favorable training costs, while analyses reveal that translation knowledge largely resides in the LM itself rather than the prompts. Overall, MSP provides a practical, scalable approach to leveraging pre-trained LMs for translation with significant gains and insights into prompt-based translation dynamics.
Abstract
Prompting has recently been shown as a promising approach for applying pre-trained language models to perform downstream tasks. We present Multi-Stage Prompting (MSP), a simple and automatic approach for leveraging pre-trained language models to translation tasks. To better mitigate the discrepancy between pre-training and translation, MSP divides the translation process via pre-trained language models into multiple separate stages: the encoding stage, the re-encoding stage, and the decoding stage. During each stage, we independently apply different continuous prompts for allowing pre-trained language models better shift to translation tasks. We conduct extensive experiments on three translation tasks. Experiments show that our method can significantly improve the translation performance of pre-trained language models.
