Table of Contents
Fetching ...

Step-Wise Formal Verification for LLM-Based Mathematical Problem Solving

Kuo Zhou, Lu Zhang

TL;DR

This paper tackles the challenge of verifying and improving LLM-based mathematical problem solving. It introduces MATH-VF, a training-free framework comprising a Formalizer that maps informal solutions into a formal language ($SimpleMath$) and a Critic that uses external tools (SymPy and Z3) to verify each step and provide corrective feedback. A key insight is the sparsity of the Solution Graph, which reduces the number of premises the Critic must consider, enabling efficient, accurate verification and refinement. Empirical results on MATH500 and ProcessBench show that MATH-VF outperforms other training-free baselines in verification and refinement, offering greater stability across problem difficulty and compatibility with closed-source models, while remaining distinct from fully training-based methods in its approach and requirements.

Abstract

Large Language Models (LLMs) have demonstrated formidable capabilities in solving mathematical problems, yet they may still commit logical reasoning and computational errors during the problem-solving process. Thus, this paper proposes a framework, MATH-VF, which includes a Formalizer and a Critic, for formally verifying the correctness of the solutions generated by large language models. Our framework first utilizes a Formalizer which employs an LLM to translate a natural language solution into a formal context. Afterward, our Critic (which integrates various external tools such as a Computer Algebra System and an SMT solver) evaluates the correctness of each statement within the formal context, and when a statement is incorrect, our Critic provides corrective feedback. We empirically investigate the effectiveness of MATH-VF in two scenarios: 1) Verification: MATH-VF is utilized to determine the correctness of a solution to a given problem. 2) Refinement: When MATH-VF identifies errors in the solution generated by an LLM-based solution generator for a given problem, it submits the corrective suggestions proposed by the Critic to the solution generator to regenerate the solution. We evaluate our framework on widely used mathematical benchmarks: MATH500 and ProcessBench, demonstrating the superiority of our approach over existing approaches.

Step-Wise Formal Verification for LLM-Based Mathematical Problem Solving

TL;DR

This paper tackles the challenge of verifying and improving LLM-based mathematical problem solving. It introduces MATH-VF, a training-free framework comprising a Formalizer that maps informal solutions into a formal language () and a Critic that uses external tools (SymPy and Z3) to verify each step and provide corrective feedback. A key insight is the sparsity of the Solution Graph, which reduces the number of premises the Critic must consider, enabling efficient, accurate verification and refinement. Empirical results on MATH500 and ProcessBench show that MATH-VF outperforms other training-free baselines in verification and refinement, offering greater stability across problem difficulty and compatibility with closed-source models, while remaining distinct from fully training-based methods in its approach and requirements.

Abstract

Large Language Models (LLMs) have demonstrated formidable capabilities in solving mathematical problems, yet they may still commit logical reasoning and computational errors during the problem-solving process. Thus, this paper proposes a framework, MATH-VF, which includes a Formalizer and a Critic, for formally verifying the correctness of the solutions generated by large language models. Our framework first utilizes a Formalizer which employs an LLM to translate a natural language solution into a formal context. Afterward, our Critic (which integrates various external tools such as a Computer Algebra System and an SMT solver) evaluates the correctness of each statement within the formal context, and when a statement is incorrect, our Critic provides corrective feedback. We empirically investigate the effectiveness of MATH-VF in two scenarios: 1) Verification: MATH-VF is utilized to determine the correctness of a solution to a given problem. 2) Refinement: When MATH-VF identifies errors in the solution generated by an LLM-based solution generator for a given problem, it submits the corrective suggestions proposed by the Critic to the solution generator to regenerate the solution. We evaluate our framework on widely used mathematical benchmarks: MATH500 and ProcessBench, demonstrating the superiority of our approach over existing approaches.

Paper Structure

This paper contains 17 sections, 6 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Methods for verifying mathematical reasoning. (a) - (c) from previous work , (d) is our work
  • Figure 2: Examle of Fitch-style proof
  • Figure 3: The overview framework of MATH-VF including a solution formalizer and a tool-integreated critic is on the left, and tool-integreated critic is depicted in detail on the right. First, the problem and solution are input into the n solution formalizer, resulting in a context of the formal solution, which can be decomposed into several judgements. And then, we obtain the results through tool-integereated critic which determines the validity of judgments by leveraging both reasoning and tool invocations.
  • Figure 4: On the left side of the figure is a dense Solution Graph, where each statement is direct conclusion of all previous step's statements. On the right side is a sparse graph, in the solution represented by this graph, statement4 and statement5 have only one premise.
  • Figure 5: Prop3 is direct conlusion of Prop1 and Prop2 , Prop2 is direct conclusion of Prop2. Therefore we have judgement: Prop1 $\vdash$ Prop3.
  • ...and 1 more figures