Table of Contents
Fetching ...

ProofSketch: Efficient Verified Reasoning for Large Language Models

Disha Sheshanarayana, Tanishka Magar

TL;DR

ProofSketch tackles the inefficiency of long reasoning chains in large language models by introducing a verification-guided framework that generates multiple short, atomic sketches and selects the one with maximal verification coverage. It combines symbolic closure computation with verifier-gated generation and lexicographic verification to balance accuracy, efficiency, and formal guarantees. The approach is formalized as a multi-objective optimization with token budgets and certification requirements, and is evaluated on the ProofWriter dataset across multiple models, demonstrating substantial token savings and meaningful certification rates while maintaining competitive accuracy. The work advances trustworthy neural reasoning in compute-constrained settings and highlights avenues to further reduce latency and extend to more complex domains.

Abstract

Reasoning methods such as chain-of-thought prompting and self-consistency have shown immense potential to improve the accuracy of large language models across various reasoning tasks. However such methods involve generation of lengthy reasoning chains, which substantially increases token consumption, computational cost, and latency. To address this inefficiency, we propose ProofSketch, a verification-guided reasoning framework that integrates symbolic closure computation, lexicographic verification and adaptive sketch generation. Our experiments show that ProofSketch consistently reduces token usage while improving accuracy, demonstrating that this approach offers a promising path for efficient and trustworthy reasoning.

ProofSketch: Efficient Verified Reasoning for Large Language Models

TL;DR

ProofSketch tackles the inefficiency of long reasoning chains in large language models by introducing a verification-guided framework that generates multiple short, atomic sketches and selects the one with maximal verification coverage. It combines symbolic closure computation with verifier-gated generation and lexicographic verification to balance accuracy, efficiency, and formal guarantees. The approach is formalized as a multi-objective optimization with token budgets and certification requirements, and is evaluated on the ProofWriter dataset across multiple models, demonstrating substantial token savings and meaningful certification rates while maintaining competitive accuracy. The work advances trustworthy neural reasoning in compute-constrained settings and highlights avenues to further reduce latency and extend to more complex domains.

Abstract

Reasoning methods such as chain-of-thought prompting and self-consistency have shown immense potential to improve the accuracy of large language models across various reasoning tasks. However such methods involve generation of lengthy reasoning chains, which substantially increases token consumption, computational cost, and latency. To address this inefficiency, we propose ProofSketch, a verification-guided reasoning framework that integrates symbolic closure computation, lexicographic verification and adaptive sketch generation. Our experiments show that ProofSketch consistently reduces token usage while improving accuracy, demonstrating that this approach offers a promising path for efficient and trustworthy reasoning.

Paper Structure

This paper contains 17 sections, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: ProofSketch framework generates sketches from theories and questions, applies closure verification through forward chaining, and uses adaptive budgeting for re-generation when verification fails. Successful verification produces certified outputs, while failures trigger budget-controlled iterative refinement until certification is achieved.
  • Figure 2: Pareto Frontier for Accuracy vs Token Usage across Models
  • Figure 3: Token Savings
  • Figure 4: Ablation Study