Table of Contents
Fetching ...

LogitsCoder: Towards Efficient Chain-of-Thought Path Search via Logits Preference Decoding for Code Generation

Jizheng Chen, Weiming Zhang, Xinyi Dai, Weiwen Liu, Kounianhua Du, Yasheng Wang, Ruiming Tang, Yong Yu, Weinan Zhang

TL;DR

LogitsCoder tackles underthinking and overthinking in chain-of-thought-based code generation by replacing costly search with lightweight, logits-guided mechanisms. It introduces Logits Preference Decoding to bias token choices, Logits Rank Based Path Selection to sample diverse reasoning trajectories, and Thoughts Aggregation to fuse multiple CoT paths into robust final reasoning before code generation. Through extensive experiments on APPS and CodeContest, LogitsCoder achieves superior code quality and token efficiency, with strong test-time scaling compared to MCTS-based and decoding-based baselines. The framework demonstrates that targeted, probabilistic control over CoT paths yields practical gains in accuracy and efficiency for complex coding tasks, pointing to scalable, real-time reasoning enhancements for open-source LLMs.

Abstract

Code generation remains a challenging task that requires precise and structured reasoning. Existing Test Time Scaling (TTS) methods, including structured tree search, have made progress in exploring reasoning paths but still face two major challenges: (1) underthinking, where reasoning chains tend to be shallow and fail to capture the full complexity of problems; and (2) overthinking, where overly verbose reasoning leads to inefficiency and increased computational costs. To address these issues, we propose LogitsCoder, a novel framework that enhances chain-of-thought reasoning through lightweight, logit-level control mechanisms for code generation. LogitsCoder iteratively generates and refines reasoning steps by first steering token selection toward statistically preferred patterns via Logits Preference Decoding, then selecting and aggregating diverse reasoning paths using Logits Rank Based Path Selection and Thoughts Aggregation. This results in coherent and effective reasoning chains that balance depth and efficiency. Extensive experiments demonstrate that LogitsCoder produces more efficient and higher-quality reasoning chains, leading to superior code generation performance compared to baseline methods.

LogitsCoder: Towards Efficient Chain-of-Thought Path Search via Logits Preference Decoding for Code Generation

TL;DR

LogitsCoder tackles underthinking and overthinking in chain-of-thought-based code generation by replacing costly search with lightweight, logits-guided mechanisms. It introduces Logits Preference Decoding to bias token choices, Logits Rank Based Path Selection to sample diverse reasoning trajectories, and Thoughts Aggregation to fuse multiple CoT paths into robust final reasoning before code generation. Through extensive experiments on APPS and CodeContest, LogitsCoder achieves superior code quality and token efficiency, with strong test-time scaling compared to MCTS-based and decoding-based baselines. The framework demonstrates that targeted, probabilistic control over CoT paths yields practical gains in accuracy and efficiency for complex coding tasks, pointing to scalable, real-time reasoning enhancements for open-source LLMs.

Abstract

Code generation remains a challenging task that requires precise and structured reasoning. Existing Test Time Scaling (TTS) methods, including structured tree search, have made progress in exploring reasoning paths but still face two major challenges: (1) underthinking, where reasoning chains tend to be shallow and fail to capture the full complexity of problems; and (2) overthinking, where overly verbose reasoning leads to inefficiency and increased computational costs. To address these issues, we propose LogitsCoder, a novel framework that enhances chain-of-thought reasoning through lightweight, logit-level control mechanisms for code generation. LogitsCoder iteratively generates and refines reasoning steps by first steering token selection toward statistically preferred patterns via Logits Preference Decoding, then selecting and aggregating diverse reasoning paths using Logits Rank Based Path Selection and Thoughts Aggregation. This results in coherent and effective reasoning chains that balance depth and efficiency. Extensive experiments demonstrate that LogitsCoder produces more efficient and higher-quality reasoning chains, leading to superior code generation performance compared to baseline methods.
Paper Structure (44 sections, 16 equations, 5 figures, 3 tables)

This paper contains 44 sections, 16 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: (A) Challenges faced by MCTS: shallow search trees causing underthinking, and excessive rollouts leading to overthinking and high computation cost. (B) LogitsCoder framework addressing these challenges with LPD and LRBPS for efficient and deeper reasoning paths.
  • Figure 2: Framework overview. LogitsCoder iteratively generates and refines reasoning chains through two stages. In Thought Generation, initial reasoning steps are generated with LPD to bias token selection toward higher-quality outputs. In Thought Refinement, LRBPS and Thoughts Aggregation are applied to enhance step-level accuracy and coherence. This iterative process continues until a complete reasoning chain is formed for Code Generation.
  • Figure 3: Ablation study: Performance of variants of LogitsCoder on APPS and CodeContest datasets.
  • Figure 4: TTS performance between LogitsCoder and RethinkMCTS, with rollout times set from 2 to 20.
  • Figure 5: Case study of LogitsCoder's search process and Linear CoT generation process on a programming task.