Table of Contents
Fetching ...

Enhancing Numerical Reasoning with the Guidance of Reliable Reasoning Processes

Dingzirui Wang, Longxu Dou, Xuanliang Zhang, Qingfu Zhu, Wanxiang Che

TL;DR

Encore tackles numerical reasoning by ensuring the reasoning process is reliable and fully supports the answer. It introduces a four-step retrieval-decompose-locate-fine-tune pipeline plus three tabular-pretraining tasks to anchor reasoning in table structure. Across five datasets, Encore yields consistent gains and outperforms LLM-derived reasoning for small models, with an average improvement around 1.8% and notable gains over reasoning produced by $gpt-3.5-turbo$. The approach reduces dependence on large, expensive models while delivering interpretable, table-grounded reasoning for complex numeric tasks.

Abstract

Numerical reasoning is an essential ability for NLP systems to handle numeric information. Recent research indicates that fine-tuning a small-scale model to learn generating reasoning processes alongside answers can significantly enhance performance. However, current methods have the limitation that most methods generate reasoning processes with large language models (LLMs), which are "unreliable" since such processes could contain information unrelated to the answer. To address this limitation, we introduce Enhancing NumeriCal reasOning with Reliable procEsses (Encore), which derives the reliable reasoning process by decomposing the answer formula, ensuring which fully supports the answer. Nevertheless, models could lack enough data to learn the reasoning process generation adequately, since our method generates only one single reasoning process for one formula. To overcome this difficulty, we present a series of pre-training tasks to help models learn the reasoning process generation with synthesized data. The experiments show that Encore yields improvement on all five experimental datasets with an average of 1.8%, proving the effectiveness of our method.

Enhancing Numerical Reasoning with the Guidance of Reliable Reasoning Processes

TL;DR

Encore tackles numerical reasoning by ensuring the reasoning process is reliable and fully supports the answer. It introduces a four-step retrieval-decompose-locate-fine-tune pipeline plus three tabular-pretraining tasks to anchor reasoning in table structure. Across five datasets, Encore yields consistent gains and outperforms LLM-derived reasoning for small models, with an average improvement around 1.8% and notable gains over reasoning produced by . The approach reduces dependence on large, expensive models while delivering interpretable, table-grounded reasoning for complex numeric tasks.

Abstract

Numerical reasoning is an essential ability for NLP systems to handle numeric information. Recent research indicates that fine-tuning a small-scale model to learn generating reasoning processes alongside answers can significantly enhance performance. However, current methods have the limitation that most methods generate reasoning processes with large language models (LLMs), which are "unreliable" since such processes could contain information unrelated to the answer. To address this limitation, we introduce Enhancing NumeriCal reasOning with Reliable procEsses (Encore), which derives the reliable reasoning process by decomposing the answer formula, ensuring which fully supports the answer. Nevertheless, models could lack enough data to learn the reasoning process generation adequately, since our method generates only one single reasoning process for one formula. To overcome this difficulty, we present a series of pre-training tasks to help models learn the reasoning process generation with synthesized data. The experiments show that Encore yields improvement on all five experimental datasets with an average of 1.8%, proving the effectiveness of our method.
Paper Structure (50 sections, 4 figures, 18 tables)

This paper contains 50 sections, 4 figures, 18 tables.

Figures (4)

  • Figure 1: The reasoning processes generated by gpt-3.5-turbo and Encore. The left process is described with natural language, where bold words are unrelated to the answer. The right process contains three parts designed in Encore that fully support the answer.
  • Figure 2: The illustration of Encore, which takes the question "What is the average current federal of 2018-2019?" as the example. Encore consists of four steps: 1.Retrieve question-related evidence. 2.Locate the table heads of each value in the formula. 3.Decompose the located formula into operators and operands. 4.Fine-tune the model with the input and the generated output.
  • Figure 3: The number of bad cases of numerical reasoning questions on TAT-QA using $\mathrm{BART_{LARGE}}$ with and without Encore under different error types. #Cases denotes the number under different error types.
  • Figure 4: An example of TAT-QA dev set of $\mathrm{BART_{LARGE}}$ with and without Encore. The correct entities are highlighted in green. The incorrect entities are highlighted in red.