Table of Contents
Fetching ...

Let's Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM's Math Capability

Ruida Wang, Yuxin Li, Yi R. Fung, Tong Zhang

TL;DR

This work introduces NFL-HR, an end-to-end framework that augments natural-language mathematical problem solving with Formal Language (FL) reasoning. It bridges NL and FL via NL-FL Problem Alignment (QA-existence translation and autoformalization), FL Reasoner Problem Solving (Mixed Problem Input with a Lean4 prover), and Answer Extraction to recover NL answers from FL reasoning. Empirical results on MATH-500 and AMC show improvements over NL baselines (up to about 4.8 percentage points) and demonstrate NFL-HR’s ability to solve problems that NL alone cannot on some instances. The study highlights the value of integrating rigorous FL reasoning into NL tasks, analyzes computation costs, and discusses limitations, generalizability, and future directions for broader cross-domain reasoning enhancements.

Abstract

Enhancing the mathematical reasoning capabilities of LLMs has garnered significant attention in both the mathematical and computer science communities. Recent works have made substantial progress in both Natural Language (NL) reasoning and Formal Language (FL) reasoning by leveraging the potential of pure Reinforcement Learning (RL) methods on base models. However, RL approaches struggle to impart new capabilities not presented in the base model, highlighting the need to integrate more knowledge like FL into NL math reasoning effectively. Yet, this integration is challenging due to inherent disparities in problem structure and reasoning format between NL and FL. To address these challenges, we introduce **NL-FL HybridReasoning (NFL-HR)**, an end-to-end framework designed to incorporate the FL expert into NL math problem-solving. To bridge the NL and FL input format gap, we propose the NL-FL Problem Alignment method, which reformulates the Question-Answering (QA) problems in NL as existence theorems in FL. Subsequently, the Mixed Problem Input technique we provide enables the FL reasoner to handle both QA and existence problems concurrently. Lastly, we mitigate the NL and FL output format gap in reasoning through an LLM-based Answer Extraction mechanism. Comprehensive experiments demonstrate that the NFL-HR framework achieves **89.80**% and **84.34%** accuracy rates on the MATH-500 and the AMC benchmarks, surpassing the NL baseline by **4.60%** and **4.82%**, respectively. Notably, some problems resolved by our framework remain unsolved by the NL baseline model even under a larger number of trials.

Let's Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM's Math Capability

TL;DR

This work introduces NFL-HR, an end-to-end framework that augments natural-language mathematical problem solving with Formal Language (FL) reasoning. It bridges NL and FL via NL-FL Problem Alignment (QA-existence translation and autoformalization), FL Reasoner Problem Solving (Mixed Problem Input with a Lean4 prover), and Answer Extraction to recover NL answers from FL reasoning. Empirical results on MATH-500 and AMC show improvements over NL baselines (up to about 4.8 percentage points) and demonstrate NFL-HR’s ability to solve problems that NL alone cannot on some instances. The study highlights the value of integrating rigorous FL reasoning into NL tasks, analyzes computation costs, and discusses limitations, generalizability, and future directions for broader cross-domain reasoning enhancements.

Abstract

Enhancing the mathematical reasoning capabilities of LLMs has garnered significant attention in both the mathematical and computer science communities. Recent works have made substantial progress in both Natural Language (NL) reasoning and Formal Language (FL) reasoning by leveraging the potential of pure Reinforcement Learning (RL) methods on base models. However, RL approaches struggle to impart new capabilities not presented in the base model, highlighting the need to integrate more knowledge like FL into NL math reasoning effectively. Yet, this integration is challenging due to inherent disparities in problem structure and reasoning format between NL and FL. To address these challenges, we introduce **NL-FL HybridReasoning (NFL-HR)**, an end-to-end framework designed to incorporate the FL expert into NL math problem-solving. To bridge the NL and FL input format gap, we propose the NL-FL Problem Alignment method, which reformulates the Question-Answering (QA) problems in NL as existence theorems in FL. Subsequently, the Mixed Problem Input technique we provide enables the FL reasoner to handle both QA and existence problems concurrently. Lastly, we mitigate the NL and FL output format gap in reasoning through an LLM-based Answer Extraction mechanism. Comprehensive experiments demonstrate that the NFL-HR framework achieves **89.80**% and **84.34%** accuracy rates on the MATH-500 and the AMC benchmarks, surpassing the NL baseline by **4.60%** and **4.82%**, respectively. Notably, some problems resolved by our framework remain unsolved by the NL baseline model even under a larger number of trials.

Paper Structure

This paper contains 44 sections, 3 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: NL-FL HybridReasoning (NFL-HR) framework: (a) NL-FL Problem Alignment: We first translate the QA-style NL problem into the NL existence problem using a general LLM, followed by converting the problem into an FL existence theorem through an autoformalizer. (b) FL Reasoner Problem Solving: We then apply the mixed problem input technique to ask the FL reasoner to concurrently address the QA problem in NL and the existence problem in FL within a unified Long CoT thinking process. (c) Answer Extraction: Finally, we use the LLM to extract the implicit NL answer from the FL prover's Long CoT output.
  • Figure 2: QA problem for NL math dataset
  • Figure 3: QA reformatted to existence problem
  • Figure 4: Formalized Lean4 statement
  • Figure 5: Input-Output of the FL reasoner
  • ...and 1 more figures