Table of Contents
Fetching ...

Chain-of-Thought in Neural Code Generation: From and For Lightweight Language Models

Guang Yang, Yu Zhou, Xiang Chen, Xiangyu Zhang, Terry Yue Zhuo, Taolue Chen

TL;DR

This work shows that lightweight language models struggle to produce high-quality chain-of-thoughts for code generation, but their performance can be substantially boosted when guided by CoTs generated by a dedicated CoT-trained model. The authors introduce COTTON, a cost-efficient pipeline that uses discrete prompts and LoRA to train CodeLlama-7B to produce CodeCoT-9k, a high-quality CoT dataset, and then applies these CoTs to improve code generation across multiple benchmarks, sometimes rivaling or surpassing much larger models. Extensive automatic and human evaluations demonstrate that COTTON-generated CoTs yield higher quality explanations and significantly improve code quality for lLMs, while also enhancing LLM performance in some cases without full fine-tuning. Overall, the approach highlights the practical potential of leveraging CoT-based guidance to democratize code-generation capabilities on resource-constrained hardware.

Abstract

Large Language Models (LLMs) have demonstrated remarkable potential in code generation. The integration of Chain of Thought (CoT) reasoning can further boost their performance. However, current CoT methods often require manual writing or LLMs with over 100 billion parameters to generate, impeding their applicability in resource-constrained scenarios. In this study, we investigate lightweight Language Models (lLMs), which are defined to have fewer than 10 billion parameters. Empirically, we find that most lLMs cannot generate high-quality CoTs when prompted by the few-shot method, but can take advantage of high-quality CoTs generated elsewhere to improve their performance in code generation. Based on these findings, we design a novel approach COTTON which can leverage lLMs to automatically generate CoTs for code generation. We synthesize new datasets and conduct extensive experiments on various benchmarks. The results show that the CoTs generated by COTTON outperform the baselines in terms of automated and human evaluation metrics. In particular, the CoTs generated by COTTON boost various lLMs to achieve higher performance gains than those generated by LLMs such as ChatGLM (130B), and are competitive with those generated by gpt-3.5-turbo (175B). Our study also showcases the potential of lLMs in software engineering applications.

Chain-of-Thought in Neural Code Generation: From and For Lightweight Language Models

TL;DR

This work shows that lightweight language models struggle to produce high-quality chain-of-thoughts for code generation, but their performance can be substantially boosted when guided by CoTs generated by a dedicated CoT-trained model. The authors introduce COTTON, a cost-efficient pipeline that uses discrete prompts and LoRA to train CodeLlama-7B to produce CodeCoT-9k, a high-quality CoT dataset, and then applies these CoTs to improve code generation across multiple benchmarks, sometimes rivaling or surpassing much larger models. Extensive automatic and human evaluations demonstrate that COTTON-generated CoTs yield higher quality explanations and significantly improve code quality for lLMs, while also enhancing LLM performance in some cases without full fine-tuning. Overall, the approach highlights the practical potential of leveraging CoT-based guidance to democratize code-generation capabilities on resource-constrained hardware.

Abstract

Large Language Models (LLMs) have demonstrated remarkable potential in code generation. The integration of Chain of Thought (CoT) reasoning can further boost their performance. However, current CoT methods often require manual writing or LLMs with over 100 billion parameters to generate, impeding their applicability in resource-constrained scenarios. In this study, we investigate lightweight Language Models (lLMs), which are defined to have fewer than 10 billion parameters. Empirically, we find that most lLMs cannot generate high-quality CoTs when prompted by the few-shot method, but can take advantage of high-quality CoTs generated elsewhere to improve their performance in code generation. Based on these findings, we design a novel approach COTTON which can leverage lLMs to automatically generate CoTs for code generation. We synthesize new datasets and conduct extensive experiments on various benchmarks. The results show that the CoTs generated by COTTON outperform the baselines in terms of automated and human evaluation metrics. In particular, the CoTs generated by COTTON boost various lLMs to achieve higher performance gains than those generated by LLMs such as ChatGLM (130B), and are competitive with those generated by gpt-3.5-turbo (175B). Our study also showcases the potential of lLMs in software engineering applications.
Paper Structure (39 sections, 11 equations, 6 figures, 14 tables)

This paper contains 39 sections, 11 equations, 6 figures, 14 tables.

Figures (6)

  • Figure 1: The motivating examples illustrating the potential of using chain-of-thought for $\ell$LM s in code generation
  • Figure 2: The workflow of the proposed approach COTTON
  • Figure 3: An example in our used datasets
  • Figure 4: A sample questionnaire used in human study
  • Figure 5: The performance of COTTON by whether using the Consistency Checker
  • ...and 1 more figures