Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning
Zhi Zhou, Tan Yuhao, Zenan Li, Yuan Yao, Lan-Zhe Guo, Xiaoxing Ma, Yu-Feng Li
TL;DR
The paper addresses reliable multi-path LLM reasoning by decomposing error into Estimation Error and Model Error, revealing complementary strengths of perplexity-based confidence (fast estimation but higher model error) and self-consistency (lower model error but slower estimation). It introduces Reasoning-pruning Perplexity Consistency (Rpc), which fuses Perplexity Consistency with Reasoning Pruning to achieve exponential convergence in estimation error while maintaining low model error. Theoretical analyses provide explicit convergence rates and pruning guarantees, and extensive experiments on seven benchmarks in math and code generation show Rpc improves reasoning accuracy, sample efficiency, and confidence calibration across model scales. The approach promises practical gains for robust and efficient LLM reasoning in complex tasks by leveraging internal probabilities and principled path pruning.
Abstract
Recent advancements in large language models (LLMs) have demonstrated remarkable reasoning capabilities. However, single-shot inference often yields unreliable results for complex reasoning tasks, leading researchers to explore multiple reasoning paths through methods such as perplexity and self-consistency. In this paper, we present the first theoretical error decomposition analysis of these techniques, breaking down their error into estimation error and model error. Our analysis reveals a fundamental trade-off: perplexity methods suffer from substantial model error due to the absence of a proper consistency function, while self-consistency exhibits high estimation error due to a slow error convergence rate. To overcome these limitations, we propose Reasoning-Pruning Perplexity Consistency (RPC). This approach combines Perplexity Consistency, which seamlessly integrates LLM perplexity with self-consistency, and Reasoning Pruning, which eliminates low-probability reasoning paths to effectively prevent the degeneration of estimation error reduction. Theoretical analysis demonstrates that RPC not only accelerates the convergence rate of estimation error to an exponential level but also holds strong potential for further reducing model error. Extensive empirical evaluations on seven benchmark datasets confirm that RPC can significantly improve reasoning performance, sample efficiency, and confidence reliability.
