Path-Consistency with Prefix Enhancement for Efficient Inference in LLMs
Jiace Zhu, Yuanzhe Huang, Yingtao Shen, Jie Zhao, An Zou
TL;DR
This paper addresses the high computation cost of self-consistency in LLM reasoning by introducing path-consistency, a prefix-enhancement method that extracts reliable prefixes from early reasoning paths to guide subsequent branches. It provides a probabilistic framework, showing that the final answer distribution can be expressed as $P(a|q) = \sum_{R_{prefix}} P(R_{prefix}|q) P(a|q, R_{prefix})$, enabling efficient, model-agnostic inference. Empirically, path-consistency yields substantial latency reductions (up to $40.5\%$) while maintaining or modestly improving accuracy across arithmetic, commonsense, and symbolic tasks, and scales with larger models. The method is compatible with existing reasoning enhancements and reduces token consumption without retraining.
Abstract
To enhance the reasoning capabilities of large language models (LLMs), self-consistency has become a popular approach, combining multiple samplings with majority voting. However, current methods are computationally expensive and time-consuming due to the need for numerous samplings. To address this, this paper introduces path-consistency, which leverages the confidence of earlier-generated answers to identify the most promising prefix and guide the generation of subsequent branches. By dynamically guiding the generation of subsequent branches based on this prefix, path-consistency mitigates both the errors and redundancies from random or less useful sampling in self-consistency. This approach reduces errors and redundancies from random sampling, significantly accelerating inference by minimizing token consumption. Our extensive empirical results demonstrate that path-consistency improves inference latency by up to 40.5\%, while maintaining task accuracy across various tasks, including mathematical reasoning, commonsense reasoning, and symbolic reasoning.
