Table of Contents
Fetching ...

Uncertainty-Guided Chain-of-Thought for Code Generation with LLMs

Yuqi Zhu, Ge Li, Xue Jiang, Jia Li, Hong Mei, Zhi Jin, Yihong Dong

TL;DR

This work addresses the overthinking tendency of Chain-of-Thought prompts in code generation by introducing UnCertainty-Aware CoT (UnCert-CoT), which gates CoT reasoning using uncertainty estimates. It proposes two confidence-based measures, Entropy-based and Probability Differential-based, to decide when to invoke CoT-decoding and generate multiple reasoning paths, selecting the most likely correct code. The method demonstrates consistent improvements on code-generation benchmarks (HumanEval and MHPP) across multiple LLMs, with up to 6.1% absolute gains in PassRate and robustness to hyperparameter settings. By selectively applying reasoning only at challenging steps, UnCert-CoT enhances accuracy while preserving efficiency, offering a practical route to more reliable code generation with LLMs. The findings highlight the value of uncertainty-aware strategies in guiding structured reasoning for programming tasks and open avenues for integrating similar gating mechanisms in other complex generation domains.

Abstract

Chain-of-Thought (CoT) reasoning has been demonstrated as an effective technique for improving the problem-solving capabilities of large language models (LLMs) in the context of code generation. However, existing CoT methods often exhibit a tendency toward "overthinking", where the LLM consistently applies reasoning strategies without adequately considering the task's underlying complexity. This results in the LLMs allocating excessive computational resources, in terms of tokens, to relatively simple tasks or problems where the correct answer is already evident. Additionally, this overthinking may lead LLMs down incorrect reasoning paths, resulting in incorrect code generation. In this paper, we introduce UnCertainty-Aware Chain-of-Thought (UnCert-CoT), an LLM-based approach designed to enhance code generation by incorporating an uncertainty-aware CoT reasoning mechanism, which focuses computational resources on targeting points where LLMs are more prone to error. We propose two confidence-based uncertainty measures: Entropy-based and Probability Differential-based methods. When uncertainty is high, UnCert-CoT activates CoT-decoding to generate multiple reasoning paths and selects the final code that exhibits the highest likelihood of correctness. In contrast, LLM directly generates the code when uncertainty is low. This uncertainty judgment mechanism allows LLMs to prioritize complex tasks and avoid unnecessary steps in simpler cases, thereby improving overall efficiency and accuracy in code generation. Our experimental results demonstrate that UnCert-CoT significantly enhances code generation accuracy on challenging benchmark MHPP(Mostly Hard Python Problems), it achieves improvements up to 6.1% on PassRate accuracy, particularly in situations where traditional LLMs are prone to errors.

Uncertainty-Guided Chain-of-Thought for Code Generation with LLMs

TL;DR

This work addresses the overthinking tendency of Chain-of-Thought prompts in code generation by introducing UnCertainty-Aware CoT (UnCert-CoT), which gates CoT reasoning using uncertainty estimates. It proposes two confidence-based measures, Entropy-based and Probability Differential-based, to decide when to invoke CoT-decoding and generate multiple reasoning paths, selecting the most likely correct code. The method demonstrates consistent improvements on code-generation benchmarks (HumanEval and MHPP) across multiple LLMs, with up to 6.1% absolute gains in PassRate and robustness to hyperparameter settings. By selectively applying reasoning only at challenging steps, UnCert-CoT enhances accuracy while preserving efficiency, offering a practical route to more reliable code generation with LLMs. The findings highlight the value of uncertainty-aware strategies in guiding structured reasoning for programming tasks and open avenues for integrating similar gating mechanisms in other complex generation domains.

Abstract

Chain-of-Thought (CoT) reasoning has been demonstrated as an effective technique for improving the problem-solving capabilities of large language models (LLMs) in the context of code generation. However, existing CoT methods often exhibit a tendency toward "overthinking", where the LLM consistently applies reasoning strategies without adequately considering the task's underlying complexity. This results in the LLMs allocating excessive computational resources, in terms of tokens, to relatively simple tasks or problems where the correct answer is already evident. Additionally, this overthinking may lead LLMs down incorrect reasoning paths, resulting in incorrect code generation. In this paper, we introduce UnCertainty-Aware Chain-of-Thought (UnCert-CoT), an LLM-based approach designed to enhance code generation by incorporating an uncertainty-aware CoT reasoning mechanism, which focuses computational resources on targeting points where LLMs are more prone to error. We propose two confidence-based uncertainty measures: Entropy-based and Probability Differential-based methods. When uncertainty is high, UnCert-CoT activates CoT-decoding to generate multiple reasoning paths and selects the final code that exhibits the highest likelihood of correctness. In contrast, LLM directly generates the code when uncertainty is low. This uncertainty judgment mechanism allows LLMs to prioritize complex tasks and avoid unnecessary steps in simpler cases, thereby improving overall efficiency and accuracy in code generation. Our experimental results demonstrate that UnCert-CoT significantly enhances code generation accuracy on challenging benchmark MHPP(Mostly Hard Python Problems), it achieves improvements up to 6.1% on PassRate accuracy, particularly in situations where traditional LLMs are prone to errors.

Paper Structure

This paper contains 23 sections, 8 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: An illustration of the overthinking phenomenon of CoT Code Generation methods. The LLM could have answered a question correctly, however, utilizing CoT Code Generation methods makes the LLM produce an incorrect reasoning chain and generate incorrect code answers.
  • Figure 2: Overview of UnCert-CoT. When generating the next code line, UnCert-CoT first calculates the uncertainty value, if the uncertainty value exceeds a predefined threshold, it suggests that the LLM is uncertain about the next step. Consequently, the CoT-decoding method is triggered to provide additional reasoning steps, thereby enhancing the accuracy of the final output.
  • Figure 3: Performance of UnCert-CoT under different $\tau$ threshold settings on HumanEval dataset.
  • Figure 4: Performance of UnCert-CoT under different $\tau$ threshold settings on MHPP dataset.
  • Figure 5: An example of the code generation results of UnCerT-CoT and state-of-the-art baselines.