Multi-Agent Collaboration for Multilingual Code Instruction Tuning
Jian Yang, Wei Zhang, Jiaxi Yang, Yibo Miao, Shanghaoran Quan, Zhenhe Wu, Qiyao Peng, Liqun Yang, Tianyu Liu, Zeyu Cui, Binyuan Hui, Junyang Lin
TL;DR
The paper tackles cross-language interference in code-focused LLMs by introducing a multilingual multi-agent framework that generates high-quality multilingual instruction data (x-Instruct) for fine-tuning Qwen2.5-xCoder. Each language-specific agent maintains generation memory and participates in centralized or parallel discussions to synthesize instructions and solutions applicable across languages, enabling efficient cross-language knowledge transfer. The authors construct seed data from code snippets, augment it through multi-agent collaboration and memory reflection, and train with supervised fine-tuning and preference-based optimization (DPO) across languages. Experiments on Python benchmarks and the multi-language MultiPL-E benchmark show that Qwen2.5-xCoder achieves strong cross-language performance, narrowing the cross-language gap and demonstrating effective multilingual code generation and understanding. The framework offers a scalable approach to multilingual code instruction tuning, with potential implications for broader multilingual software engineering tasks.
Abstract
Recent advancement in code understanding and generation demonstrates that code LLMs fine-tuned on a high-quality instruction dataset can gain powerful capabilities to address wide-ranging code-related tasks. However, most previous existing methods mainly view each programming language in isolation and ignore the knowledge transfer among different programming languages. To bridge the gap among different programming languages, we introduce a novel multi-agent collaboration framework to enhance multilingual instruction tuning for code LLMs, where multiple language-specific intelligent agent components with generation memory work together to transfer knowledge from one language to another efficiently and effectively. Specifically, we first generate the language-specific instruction data from the code snippets and then provide the generated data as the seed data for language-specific agents. Multiple language-specific agents discuss and collaborate to formulate a new instruction and its corresponding solution (A new programming language or existing programming language), To further encourage the cross-lingual transfer, each agent stores its generation history as memory and then summarizes its merits and faults. Finally, the high-quality multilingual instruction data is used to encourage knowledge transfer among different programming languages to train Qwen2.5-xCoder. Experimental results on multilingual programming benchmarks demonstrate the superior performance of Qwen2.5-xCoder in sharing common knowledge, highlighting its potential to reduce the cross-lingual gap.
