Table of Contents
Fetching ...

BayLing: Bridging Cross-lingual Alignment and Instruction Following through Interactive Translation for Large Language Models

Shaolei Zhang, Qingkai Fang, Zhuocheng Zhang, Zhengrui Ma, Yan Zhou, Langlin Huang, Mengyu Bu, Shangtong Gui, Yunji Chen, Xilin Chen, Yang Feng

TL;DR

BayLing addresses the challenge of non-English performance in English-pretrained LLMs by using an interactive translation framework to align languages and transfer English instruction-following abilities without additional non-English pretraining. Built on LLaMA, BayLing is fine-tuned with automatically generated interactive translation instructions and general task data, achieving competitive translation performance and strong cross-lingual capabilities across Chinese and German with 13B-parameter models. The paper presents extensive multilingual translation, interactive translation, general-task benchmarks, and standardized tests, showing BayLing approaching GPT-3.5-turbo and surpassing many open-source baselines while demonstrating robust cross-language transfer through language alignment. It posits interactive translation as an efficient path to scale cross-lingual instruction-following and knowledge transfer, potentially informing future multilingual LLM development.

Abstract

Large language models (LLMs) have demonstrated remarkable prowess in language understanding and generation. Advancing from foundation LLMs to instructionfollowing LLMs, instruction tuning plays a vital role in aligning LLMs to human preferences. However, the existing LLMs are usually focused on English, leading to inferior performance in non-English languages. In order to improve the performance for non-English languages, it is necessary to collect language-specific training data for foundation LLMs and construct language-specific instructions for instruction tuning, both of which are heavy loads. To minimize human workload, we propose to transfer the capabilities of language generation and instruction following from English to other languages through an interactive translation task. We have developed BayLing, an instruction-following LLM by utilizing LLaMA as the foundation LLM and automatically constructing interactive translation instructions for instructing tuning. Extensive assessments demonstrate that BayLing achieves comparable performance to GPT-3.5-turbo, despite utilizing a considerably smaller parameter size of only 13 billion. Experimental results on translation tasks show that BayLing achieves 95% of single-turn translation capability compared to GPT-4 with automatic evaluation and 96% of interactive translation capability compared to GPT-3.5-turbo with human evaluation. To estimate the performance on general tasks, we created a multi-turn instruction test set called BayLing-80. The experimental results on BayLing-80 indicate that BayLing achieves 89% of performance compared to GPT-3.5-turbo. BayLing also demonstrates outstanding performance on knowledge assessment of Chinese GaoKao and English SAT, second only to GPT-3.5-turbo among a multitude of instruction-following LLMs. Demo, homepage, code and models of BayLing are available.

BayLing: Bridging Cross-lingual Alignment and Instruction Following through Interactive Translation for Large Language Models

TL;DR

BayLing addresses the challenge of non-English performance in English-pretrained LLMs by using an interactive translation framework to align languages and transfer English instruction-following abilities without additional non-English pretraining. Built on LLaMA, BayLing is fine-tuned with automatically generated interactive translation instructions and general task data, achieving competitive translation performance and strong cross-lingual capabilities across Chinese and German with 13B-parameter models. The paper presents extensive multilingual translation, interactive translation, general-task benchmarks, and standardized tests, showing BayLing approaching GPT-3.5-turbo and surpassing many open-source baselines while demonstrating robust cross-language transfer through language alignment. It posits interactive translation as an efficient path to scale cross-lingual instruction-following and knowledge transfer, potentially informing future multilingual LLM development.

Abstract

Large language models (LLMs) have demonstrated remarkable prowess in language understanding and generation. Advancing from foundation LLMs to instructionfollowing LLMs, instruction tuning plays a vital role in aligning LLMs to human preferences. However, the existing LLMs are usually focused on English, leading to inferior performance in non-English languages. In order to improve the performance for non-English languages, it is necessary to collect language-specific training data for foundation LLMs and construct language-specific instructions for instruction tuning, both of which are heavy loads. To minimize human workload, we propose to transfer the capabilities of language generation and instruction following from English to other languages through an interactive translation task. We have developed BayLing, an instruction-following LLM by utilizing LLaMA as the foundation LLM and automatically constructing interactive translation instructions for instructing tuning. Extensive assessments demonstrate that BayLing achieves comparable performance to GPT-3.5-turbo, despite utilizing a considerably smaller parameter size of only 13 billion. Experimental results on translation tasks show that BayLing achieves 95% of single-turn translation capability compared to GPT-4 with automatic evaluation and 96% of interactive translation capability compared to GPT-3.5-turbo with human evaluation. To estimate the performance on general tasks, we created a multi-turn instruction test set called BayLing-80. The experimental results on BayLing-80 indicate that BayLing achieves 89% of performance compared to GPT-3.5-turbo. BayLing also demonstrates outstanding performance on knowledge assessment of Chinese GaoKao and English SAT, second only to GPT-3.5-turbo among a multitude of instruction-following LLMs. Demo, homepage, code and models of BayLing are available.
Paper Structure (27 sections, 14 figures, 10 tables)

This paper contains 27 sections, 14 figures, 10 tables.

Figures (14)

  • Figure 1: Overview of BayLing. BayLing is built upon LLaMA and fine-tuned with instruction data of interactive translation task and general tasks.
  • Figure 2: An illustration of interactive translation task
  • Figure 3: Performance comparison on WMT22 Chinese$\Leftrightarrow$English translation task
  • Figure 4: Performance comparison on WMT22 German$\Leftrightarrow$English translation task
  • Figure 5: Zero-shot translation performance of BayLing on WMT22 multilingual translation tasks. These translation directions do not appear in BayLing's fine-tuning instructions, but it is uncertain whether they are zero-shot in other models for comparison.
  • ...and 9 more figures