Table of Contents
Fetching ...

LeDex: Training LLMs to Better Self-Debug and Explain Code

Nan Jiang, Xiaopeng Li, Shiqi Wang, Qiang Zhou, Soneya Binta Hossain, Baishakhi Ray, Varun Kumar, Xiaofei Ma, Anoop Deoras

TL;DR

LeDex addresses the challenge of open-source LLMs effectively self-debugging and explaining code by constructing a scalable training workflow that automatically collects high-quality explanations and refinements, verifies them with execution feedback, and then applies supervised fine-tuning followed by reinforcement learning with a dual reward structure. The approach yields significant improvements in code generation and refinement metrics (e.g., up to 15.92% pass@1 and 9.30% pass@10) and demonstrates iterative refinement capabilities, robustness across backbones (StarCoder-15B, CodeLlama-7B/13B) and data sources (GPT-3.5-Turbo, open LLMs, self-bootstrapped data). Generalizability is evidenced by successful data collection from open-source models and effective RL using the LeDex rewards, while human evaluations confirm higher-quality explanations that assist developers in debugging. Overall, LeDex offers a model-agnostic, scalable framework that meaningfully enhances self-debugging in open-source LLMs and provides richer, more actionable code explanations for developers.

Abstract

In the domain of code generation, self-debugging is crucial. It allows LLMs to refine their generated code based on execution feedback. This is particularly important because generating correct solutions in one attempt proves challenging for complex tasks. Prior works on self-debugging mostly focus on prompting methods by providing LLMs with few-shot examples, which work poorly on small open-sourced LLMs. In this work, we propose LeDex, a training framework that significantly improves the self-debugging capability of LLMs. Intuitively, we observe that a chain of explanations on the wrong code followed by code refinement helps LLMs better analyze the wrong code and do refinement. We thus propose an automated pipeline to collect a high-quality dataset for code explanation and refinement by generating a number of explanations and refinement trajectories from the LLM itself or a larger teacher model and filtering via execution verification. We perform supervised fine-tuning (SFT) and further reinforcement learning (RL) on both success and failure trajectories with a novel reward design considering code explanation and refinement quality. SFT improves the pass@1 by up to 15.92% and pass@10 by 9.30% over four benchmarks. RL training brings additional up to 3.54% improvement on pass@1 and 2.55% improvement on pass@10. The trained LLMs show iterative refinement ability and can keep refining code continuously. Lastly, our human evaluation shows that the LLMs trained with our framework generate more useful code explanations and help developers better understand bugs in source code.

LeDex: Training LLMs to Better Self-Debug and Explain Code

TL;DR

LeDex addresses the challenge of open-source LLMs effectively self-debugging and explaining code by constructing a scalable training workflow that automatically collects high-quality explanations and refinements, verifies them with execution feedback, and then applies supervised fine-tuning followed by reinforcement learning with a dual reward structure. The approach yields significant improvements in code generation and refinement metrics (e.g., up to 15.92% pass@1 and 9.30% pass@10) and demonstrates iterative refinement capabilities, robustness across backbones (StarCoder-15B, CodeLlama-7B/13B) and data sources (GPT-3.5-Turbo, open LLMs, self-bootstrapped data). Generalizability is evidenced by successful data collection from open-source models and effective RL using the LeDex rewards, while human evaluations confirm higher-quality explanations that assist developers in debugging. Overall, LeDex offers a model-agnostic, scalable framework that meaningfully enhances self-debugging in open-source LLMs and provides richer, more actionable code explanations for developers.

Abstract

In the domain of code generation, self-debugging is crucial. It allows LLMs to refine their generated code based on execution feedback. This is particularly important because generating correct solutions in one attempt proves challenging for complex tasks. Prior works on self-debugging mostly focus on prompting methods by providing LLMs with few-shot examples, which work poorly on small open-sourced LLMs. In this work, we propose LeDex, a training framework that significantly improves the self-debugging capability of LLMs. Intuitively, we observe that a chain of explanations on the wrong code followed by code refinement helps LLMs better analyze the wrong code and do refinement. We thus propose an automated pipeline to collect a high-quality dataset for code explanation and refinement by generating a number of explanations and refinement trajectories from the LLM itself or a larger teacher model and filtering via execution verification. We perform supervised fine-tuning (SFT) and further reinforcement learning (RL) on both success and failure trajectories with a novel reward design considering code explanation and refinement quality. SFT improves the pass@1 by up to 15.92% and pass@10 by 9.30% over four benchmarks. RL training brings additional up to 3.54% improvement on pass@1 and 2.55% improvement on pass@10. The trained LLMs show iterative refinement ability and can keep refining code continuously. Lastly, our human evaluation shows that the LLMs trained with our framework generate more useful code explanations and help developers better understand bugs in source code.
Paper Structure (37 sections, 7 equations, 13 figures, 15 tables)

This paper contains 37 sections, 7 equations, 13 figures, 15 tables.

Figures (13)

  • Figure 1: Pipeline of letting LLM generate code and self-debug.
  • Figure 2: Overview of LeDex.
  • Figure 3: The CodeBLEU scores, unit test cases passing rate, sentiment similarity of wrong code explanations, final refinement code reward, and the explanation reward of the training data.
  • Figure 4: Pass@k of prompting, SFT, and RL CodeLlama-7B after three iterations of refinements.
  • Figure 5: Two different prompts to ask LLM to self-refine: directly asking for refinement (left), asking for an explanation of the wrong code, and then refining in chain-of-thought (right).
  • ...and 8 more figures