Table of Contents
Fetching ...

Path-Consistency with Prefix Enhancement for Efficient Inference in LLMs

Jiace Zhu, Yuanzhe Huang, Yingtao Shen, Jie Zhao, An Zou

TL;DR

This paper addresses the high computation cost of self-consistency in LLM reasoning by introducing path-consistency, a prefix-enhancement method that extracts reliable prefixes from early reasoning paths to guide subsequent branches. It provides a probabilistic framework, showing that the final answer distribution can be expressed as $P(a|q) = \sum_{R_{prefix}} P(R_{prefix}|q) P(a|q, R_{prefix})$, enabling efficient, model-agnostic inference. Empirically, path-consistency yields substantial latency reductions (up to $40.5\%$) while maintaining or modestly improving accuracy across arithmetic, commonsense, and symbolic tasks, and scales with larger models. The method is compatible with existing reasoning enhancements and reduces token consumption without retraining.

Abstract

To enhance the reasoning capabilities of large language models (LLMs), self-consistency has become a popular approach, combining multiple samplings with majority voting. However, current methods are computationally expensive and time-consuming due to the need for numerous samplings. To address this, this paper introduces path-consistency, which leverages the confidence of earlier-generated answers to identify the most promising prefix and guide the generation of subsequent branches. By dynamically guiding the generation of subsequent branches based on this prefix, path-consistency mitigates both the errors and redundancies from random or less useful sampling in self-consistency. This approach reduces errors and redundancies from random sampling, significantly accelerating inference by minimizing token consumption. Our extensive empirical results demonstrate that path-consistency improves inference latency by up to 40.5\%, while maintaining task accuracy across various tasks, including mathematical reasoning, commonsense reasoning, and symbolic reasoning.

Path-Consistency with Prefix Enhancement for Efficient Inference in LLMs

TL;DR

This paper addresses the high computation cost of self-consistency in LLM reasoning by introducing path-consistency, a prefix-enhancement method that extracts reliable prefixes from early reasoning paths to guide subsequent branches. It provides a probabilistic framework, showing that the final answer distribution can be expressed as , enabling efficient, model-agnostic inference. Empirically, path-consistency yields substantial latency reductions (up to ) while maintaining or modestly improving accuracy across arithmetic, commonsense, and symbolic tasks, and scales with larger models. The method is compatible with existing reasoning enhancements and reduces token consumption without retraining.

Abstract

To enhance the reasoning capabilities of large language models (LLMs), self-consistency has become a popular approach, combining multiple samplings with majority voting. However, current methods are computationally expensive and time-consuming due to the need for numerous samplings. To address this, this paper introduces path-consistency, which leverages the confidence of earlier-generated answers to identify the most promising prefix and guide the generation of subsequent branches. By dynamically guiding the generation of subsequent branches based on this prefix, path-consistency mitigates both the errors and redundancies from random or less useful sampling in self-consistency. This approach reduces errors and redundancies from random sampling, significantly accelerating inference by minimizing token consumption. Our extensive empirical results demonstrate that path-consistency improves inference latency by up to 40.5\%, while maintaining task accuracy across various tasks, including mathematical reasoning, commonsense reasoning, and symbolic reasoning.
Paper Structure (29 sections, 10 equations, 9 figures, 14 tables)

This paper contains 29 sections, 10 equations, 9 figures, 14 tables.

Figures (9)

  • Figure 1: Path-consistency extracts prefixes from earlier generated inference paths to guide the inference of subsequent branches.
  • Figure 2: Comparison of inference latency speedup and average token consumption reduction under different prefix levels, demonstrating the effect of path-consistency on GSM8K.
  • Figure 3: An "extract-and-sample" inference process of the proposed path-consistency. It seeks the "optimal path" in the form of the "prefix", thereby progressively reducing the number of generated tokens and significantly shortening inference latency.
  • Figure 4: Speedup of inference.
  • Figure 5: The change in the proportion of tokens generated by path-consistency on correct or incorrect paths.
  • ...and 4 more figures