Table of Contents
Fetching ...

MSCoT: Structured Chain-of-Thought Generation for Multiple Programming Languages

Naizhu Jin, Zhong Li, Tian Zhang, Qingkai Zeng

TL;DR

MSCoT addresses the challenge of multilingual code generation by introducing a structured CoT generation framework across 12 programming languages. It builds a large-scale dataset of 84,000 CoT samples via a three-agent framework (CQAgent, CTAgent, SCoTAgent) and fine-tunes a 7B CodeLLM with LoRA, guided by an instruction template. Empirical results across two CodeLLMs show MSCoT yields significant gains in Pass@1 and CoT-Pass@1, approaching GPT-4-based performance with substantially lower resource demands, and a human study confirms higher-quality CoTs. The work also contributes open-source resources to advance multilingual CoT research and practical code-generation tooling.

Abstract

With the rapid development of code intelligence, the application of multiple programming languages is becoming increasingly widespread. However, most existing code generation models mainly focus on a single or a few programming languages, resulting in unsatisfactory performance in a multilingual environment. Chain-of-Thought (CoT) reasoning can significantly improve the performance of the model without the need for retraining or fine-tuning the code generation model by reasonably decomposing complex code generation tasks into multiple subtasks and gradually deriving solutions for each subtask. Nevertheless, the existing CoT generation methods mainly concentrate on Python code, and the performance on other programming languages remains unclear. To fill this gap, we first constructed a CoT generation dataset for 12 programming languages through multi-agent technology. On this basis, we proposed a CoT generation method MSCoT applicable to multiple programming languages. By introducing CoT into the code generation large model, the performance of the code generation large model in a multilingual environment can be improved. Through large-scale empirical research, we compared the generalization abilities of MSCoT and the existing CoT generation methods on multiple programming languages and proved the effectiveness of MSCoT for multiple programming languages. In addition, we also designed a human study to prove the quality of the CoT generated by MSCoT. Finally, we opensourced the model and dataset of MSCoT to promote the research on CoT generation for multiple programming languages.

MSCoT: Structured Chain-of-Thought Generation for Multiple Programming Languages

TL;DR

MSCoT addresses the challenge of multilingual code generation by introducing a structured CoT generation framework across 12 programming languages. It builds a large-scale dataset of 84,000 CoT samples via a three-agent framework (CQAgent, CTAgent, SCoTAgent) and fine-tunes a 7B CodeLLM with LoRA, guided by an instruction template. Empirical results across two CodeLLMs show MSCoT yields significant gains in Pass@1 and CoT-Pass@1, approaching GPT-4-based performance with substantially lower resource demands, and a human study confirms higher-quality CoTs. The work also contributes open-source resources to advance multilingual CoT research and practical code-generation tooling.

Abstract

With the rapid development of code intelligence, the application of multiple programming languages is becoming increasingly widespread. However, most existing code generation models mainly focus on a single or a few programming languages, resulting in unsatisfactory performance in a multilingual environment. Chain-of-Thought (CoT) reasoning can significantly improve the performance of the model without the need for retraining or fine-tuning the code generation model by reasonably decomposing complex code generation tasks into multiple subtasks and gradually deriving solutions for each subtask. Nevertheless, the existing CoT generation methods mainly concentrate on Python code, and the performance on other programming languages remains unclear. To fill this gap, we first constructed a CoT generation dataset for 12 programming languages through multi-agent technology. On this basis, we proposed a CoT generation method MSCoT applicable to multiple programming languages. By introducing CoT into the code generation large model, the performance of the code generation large model in a multilingual environment can be improved. Through large-scale empirical research, we compared the generalization abilities of MSCoT and the existing CoT generation methods on multiple programming languages and proved the effectiveness of MSCoT for multiple programming languages. In addition, we also designed a human study to prove the quality of the CoT generated by MSCoT. Finally, we opensourced the model and dataset of MSCoT to promote the research on CoT generation for multiple programming languages.

Paper Structure

This paper contains 21 sections, 4 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Example of CoT Generation
  • Figure 2: Performance of Code Generation Models on HumanEval-XL
  • Figure 3: Approach of MSCoT
  • Figure 4: The correlation heatmap of the generated CoT between COTTON and MSCoT under different programming languages