FormulaReasoning: A Dataset for Formula-Based Numerical Reasoning
Xiao Li, Bolin Zhu, Kaiwen Shi, Sichen Liu, Yin Zhu, Yiwei Liu, Gong Cheng
TL;DR
FormulaReasoning introduces a bilingual dataset for formula-based numerical reasoning that requires explicit physics formulas, such as $Q_{absorbed}=m c \Delta T$, to ground calculations. Each question is annotated with normalized formulas, parameter names, symbols, units, and explanations, and a consolidated formula database serves as external knowledge. The study benchmarks a broad range of approaches, including large LLMs with CoT prompts, retrieval-augmented methods, supervised fine-tuning, and Direct Preference Optimization, revealing substantial performance gaps for smaller models and the value of external formula knowledge. The dataset provides a solid baseline and a resource for future improvements in domain-guided, multi-step reasoning, with public releases on HuggingFace and GitHub. These results highlight the significance of explicit formula knowledge for robust numerical reasoning in real-world tasks.
Abstract
The application of formulas (e.g., physics formulas) is a fundamental human ability in solving numerical reasoning problems. Existing numerical reasoning datasets rarely explicitly state the formulas employed, as their questions often rely on implicit commonsense mathematical knowledge. To address this gap, we introduce FormulaReasoning, a new dataset specifically designed for formula-based numerical reasoning. It consists of 5,324 questions that require numerical calculations grounded in external physics formulas. We provide normalized, fine-grained annotations in both English and Chinese, including formula structures, parameter names, symbols, numerical values, and units-curated through extensive manual effort with LLM-assisted validation to ensure high quality. Additionally, we offer a consolidated formula database to serve as an external knowledge source. We analyze various reasoning approaches on FormulaReasoning, with emphasis on comparative evaluation of different architectural and methodological frameworks. Our assessment includes retrieval-augmented methods, approaches that decompose reasoning into formula generation, parameter extraction, and numerical calculation, as well as optimization techniques using preference data. We identify key challenges in formula-based numerical reasoning that require further investigation across different reasoning paradigms, highlighting opportunities for methodological advancement.
