Improving Arithmetic Reasoning Ability of Large Language Models through Relation Tuples, Verification and Dynamic Feedback
Zhongtao Miao, Kaiyan Zhao, Yoshimasa Tsuruoka
TL;DR
This work tackles improving arithmetic reasoning in large language models by introducing ART, a framework that uses relation tuples as semi-structured reasoning steps paired with a local Python-based verifier and a dynamic feedback loop. Each reasoning step is linked to a relation tuple $(r_i,t_i)$ and validated through Python code $C_i$ executed via a local interpreter, producing a verification result $\hat{A_i^v}$ to compare with the initial answer $\hat{A_i}$. Across seven arithmetic datasets and multiple LLMs, ART outperforms natural-language CoT, code-based PAL, and ModelSelection baselines, with notable gains on SVAMP and GSM8K, and it remains compatible with Self-Consistency. The method provides readable, machine-verifiable reasoning and a lightweight, model-agnostic verification pathway that can be integrated into existing prompting pipelines to enhance arithmetic reasoning.
Abstract
Current representations used in reasoning steps of large language models can mostly be categorized into two main types: (1) natural language, which is difficult to verify; and (2) non-natural language, usually programming code, which is difficult for people who are unfamiliar with coding to read. In this paper, we propose to use a semi-structured form to represent reasoning steps of large language models. Specifically, we use relation tuples, which are not only human-readable but also machine-friendly and easier to verify than natural language. We implement a framework that includes three main components: (1) introducing relation tuples into the reasoning steps of large language models; (2) implementing an automatic verification process of reasoning steps with a local code interpreter based on relation tuples; and (3) integrating a simple and effective dynamic feedback mechanism, which we found helpful for self-improvement of large language models. The experimental results on various arithmetic datasets demonstrate the effectiveness of our method in improving the arithmetic reasoning ability of large language models. The source code is available at https://github.com/gpgg/art.
