Uncertainty-Guided Chain-of-Thought for Code Generation with LLMs
Yuqi Zhu, Ge Li, Xue Jiang, Jia Li, Hong Mei, Zhi Jin, Yihong Dong
TL;DR
This work addresses the overthinking tendency of Chain-of-Thought prompts in code generation by introducing UnCertainty-Aware CoT (UnCert-CoT), which gates CoT reasoning using uncertainty estimates. It proposes two confidence-based measures, Entropy-based and Probability Differential-based, to decide when to invoke CoT-decoding and generate multiple reasoning paths, selecting the most likely correct code. The method demonstrates consistent improvements on code-generation benchmarks (HumanEval and MHPP) across multiple LLMs, with up to 6.1% absolute gains in PassRate and robustness to hyperparameter settings. By selectively applying reasoning only at challenging steps, UnCert-CoT enhances accuracy while preserving efficiency, offering a practical route to more reliable code generation with LLMs. The findings highlight the value of uncertainty-aware strategies in guiding structured reasoning for programming tasks and open avenues for integrating similar gating mechanisms in other complex generation domains.
Abstract
Chain-of-Thought (CoT) reasoning has been demonstrated as an effective technique for improving the problem-solving capabilities of large language models (LLMs) in the context of code generation. However, existing CoT methods often exhibit a tendency toward "overthinking", where the LLM consistently applies reasoning strategies without adequately considering the task's underlying complexity. This results in the LLMs allocating excessive computational resources, in terms of tokens, to relatively simple tasks or problems where the correct answer is already evident. Additionally, this overthinking may lead LLMs down incorrect reasoning paths, resulting in incorrect code generation. In this paper, we introduce UnCertainty-Aware Chain-of-Thought (UnCert-CoT), an LLM-based approach designed to enhance code generation by incorporating an uncertainty-aware CoT reasoning mechanism, which focuses computational resources on targeting points where LLMs are more prone to error. We propose two confidence-based uncertainty measures: Entropy-based and Probability Differential-based methods. When uncertainty is high, UnCert-CoT activates CoT-decoding to generate multiple reasoning paths and selects the final code that exhibits the highest likelihood of correctness. In contrast, LLM directly generates the code when uncertainty is low. This uncertainty judgment mechanism allows LLMs to prioritize complex tasks and avoid unnecessary steps in simpler cases, thereby improving overall efficiency and accuracy in code generation. Our experimental results demonstrate that UnCert-CoT significantly enhances code generation accuracy on challenging benchmark MHPP(Mostly Hard Python Problems), it achieves improvements up to 6.1% on PassRate accuracy, particularly in situations where traditional LLMs are prone to errors.
