Table of Contents
Fetching ...

Typed Chain-of-Thought: A Curry-Howard Framework for Verifying LLM Reasoning

Elija Perrier

TL;DR

This work introduces Proof-Carrying Chain-of-Thought (PC-CoT), a decoding-time framework based on the Curry-Howard correspondence to make LLM reasoning verifiably computational. By typing each CoT step into a lightweight formal system and building Typed Reasoning Graphs, PC-CoT produces Typed Faithfulness Certificates that certify dataflow from premises to conclusions. It combines a three-stage pipeline (typed program emission, TRG construction, and Certified Self-Consistency) with two certification gates (Relaxed and Strict) to balance coverage and precision. Empirically on GSM8K, PC-CoT yields substantial accuracy gains (up to ~70% with Relaxed gate) and high precision (>90% under strict certification) while providing auditable, human-readable proof-like traces. The approach demonstrates the potential of formal verification techniques to transform interpretability from plausible narratives into verifiable computational artifacts, offering a scalable path toward reliable AI reasoning.

Abstract

While Chain-of-Thought (CoT) prompting enhances the reasoning capabilities of large language models, the faithfulness of the generated rationales remains an open problem for model interpretability. We propose a novel theoretical lens for this problem grounded in the Curry-Howard correspondence, which posits a direct relationship between formal proofs and computer programs. Under this paradigm, a faithful reasoning trace is analogous to a well-typed program, where each intermediate step corresponds to a typed logical inference. We operationalise this analogy, presenting methods to extract and map the informal, natural language steps of CoT into a formal, typed proof structure. Successfully converting a CoT trace into a well-typed proof serves as a strong, verifiable certificate of its computational faithfulness, moving beyond heuristic interpretability towards formal verification. Our framework provides a methodology to transform plausible narrative explanations into formally verifiable programs, offering a path towards building more reliable and trustworthy AI systems.

Typed Chain-of-Thought: A Curry-Howard Framework for Verifying LLM Reasoning

TL;DR

This work introduces Proof-Carrying Chain-of-Thought (PC-CoT), a decoding-time framework based on the Curry-Howard correspondence to make LLM reasoning verifiably computational. By typing each CoT step into a lightweight formal system and building Typed Reasoning Graphs, PC-CoT produces Typed Faithfulness Certificates that certify dataflow from premises to conclusions. It combines a three-stage pipeline (typed program emission, TRG construction, and Certified Self-Consistency) with two certification gates (Relaxed and Strict) to balance coverage and precision. Empirically on GSM8K, PC-CoT yields substantial accuracy gains (up to ~70% with Relaxed gate) and high precision (>90% under strict certification) while providing auditable, human-readable proof-like traces. The approach demonstrates the potential of formal verification techniques to transform interpretability from plausible narratives into verifiable computational artifacts, offering a scalable path toward reliable AI reasoning.

Abstract

While Chain-of-Thought (CoT) prompting enhances the reasoning capabilities of large language models, the faithfulness of the generated rationales remains an open problem for model interpretability. We propose a novel theoretical lens for this problem grounded in the Curry-Howard correspondence, which posits a direct relationship between formal proofs and computer programs. Under this paradigm, a faithful reasoning trace is analogous to a well-typed program, where each intermediate step corresponds to a typed logical inference. We operationalise this analogy, presenting methods to extract and map the informal, natural language steps of CoT into a formal, typed proof structure. Successfully converting a CoT trace into a well-typed proof serves as a strong, verifiable certificate of its computational faithfulness, moving beyond heuristic interpretability towards formal verification. Our framework provides a methodology to transform plausible narrative explanations into formally verifiable programs, offering a path towards building more reliable and trustworthy AI systems.

Paper Structure

This paper contains 59 sections, 8 equations, 6 figures, 5 tables.

Figures (6)

  • Figure : (a) Overall accuracy (95% Wilson CI)
  • Figure : (a) Coverage vs. UVR minimum
  • Figure : (a) Overall accuracy (95% Wilson CI)
  • Figure : (b) Reliability vs. acceptance count
  • Figure : (a) Coverage vs. UVR minimum
  • ...and 1 more figures