Table of Contents
Fetching ...

Chaining Simultaneous Thoughts for Numerical Reasoning

Zhihong Shao, Fei Huang, Minlie Huang

TL;DR

This work addresses numerical reasoning over textual problems by introducing CANTOR, a non-autoregressive DAG-based reasoner that predicts diverse reasoning steps in parallel and then chains relevant ones to form the final equation. It replaces pre-defined decoding orders with internal DAG-based dependencies and employs training regimes like hard EM, MML, and annealing to learn from fully- and weakly-supervised data. Empirically, CANTOR achieves state-of-the-art results on MathQA and SVAMP, improves performance on DROP under weak supervision, and remains competitive with far larger language models while offering faster inference. The approach demonstrates the value of explicit structure modeling and diverse reasoning paths for robust, scalable numerical reasoning.

Abstract

Given that rich information is hidden behind ubiquitous numbers in text, numerical reasoning over text should be an essential skill of AI systems. To derive precise equations to solve numerical reasoning problems, previous work focused on modeling the structures of equations, and has proposed various structured decoders. Though structure modeling proves to be effective, these structured decoders construct a single equation in a pre-defined autoregressive order, potentially placing an unnecessary restriction on how a model should grasp the reasoning process. Intuitively, humans may have numerous pieces of thoughts popping up in no pre-defined order; thoughts are not limited to the problem at hand, and can even be concerned with other related problems. By comparing diverse thoughts and chaining relevant pieces, humans are less prone to errors. In this paper, we take this inspiration and propose CANTOR, a numerical reasoner that models reasoning steps using a directed acyclic graph where we produce diverse reasoning steps simultaneously without pre-defined decoding dependencies, and compare and chain relevant ones to reach a solution. Extensive experiments demonstrated the effectiveness of CANTOR under both fully-supervised and weakly-supervised settings.

Chaining Simultaneous Thoughts for Numerical Reasoning

TL;DR

This work addresses numerical reasoning over textual problems by introducing CANTOR, a non-autoregressive DAG-based reasoner that predicts diverse reasoning steps in parallel and then chains relevant ones to form the final equation. It replaces pre-defined decoding orders with internal DAG-based dependencies and employs training regimes like hard EM, MML, and annealing to learn from fully- and weakly-supervised data. Empirically, CANTOR achieves state-of-the-art results on MathQA and SVAMP, improves performance on DROP under weak supervision, and remains competitive with far larger language models while offering faster inference. The approach demonstrates the value of explicit structure modeling and diverse reasoning paths for robust, scalable numerical reasoning.

Abstract

Given that rich information is hidden behind ubiquitous numbers in text, numerical reasoning over text should be an essential skill of AI systems. To derive precise equations to solve numerical reasoning problems, previous work focused on modeling the structures of equations, and has proposed various structured decoders. Though structure modeling proves to be effective, these structured decoders construct a single equation in a pre-defined autoregressive order, potentially placing an unnecessary restriction on how a model should grasp the reasoning process. Intuitively, humans may have numerous pieces of thoughts popping up in no pre-defined order; thoughts are not limited to the problem at hand, and can even be concerned with other related problems. By comparing diverse thoughts and chaining relevant pieces, humans are less prone to errors. In this paper, we take this inspiration and propose CANTOR, a numerical reasoner that models reasoning steps using a directed acyclic graph where we produce diverse reasoning steps simultaneously without pre-defined decoding dependencies, and compare and chain relevant ones to reach a solution. Extensive experiments demonstrated the effectiveness of CANTOR under both fully-supervised and weakly-supervised settings.
Paper Structure (39 sections, 21 equations, 4 figures, 13 tables)

This paper contains 39 sections, 21 equations, 4 figures, 13 tables.

Figures (4)

  • Figure 1: (a) Possible pieces of human thoughts that pops up in no pre-defined order; (b) How our model captures the reasoning process similarly. Reasoning steps inside solid frames and dashed frames are necessary and loosely-relevant ones, respectively.
  • Figure 2: Overview of CANTOR. CANTOR models diverse operations using a DAG. Each vertex corresponds to an operation, which is chained with its operands via edges in the graph. We decode an equation by simultaneously verbalizing operators at each vertex, chaining operations with operands, and selecting the root vertex; the selected root vertex along with all its descendants is the resulting equation in a DAG format. In this example, the ground-truth equation $Y$ can be represented by the decoded sub-graph $Z$, as mapping $y_1$ to $v_2$ and $y_2$ to $v_4$ produces $Z$ exactly.
  • Figure 3: A test case from SVAMP. Operations leading to the same quantity are marked with the same color. Purple ones are operations evaluating to the correct answer. For a clear presentation of our DAG, we only retain top-5 root vertices along with their descendants. We also present probabilities of predicted operators, operands, and root vertices. The best baseline DeductReasoner overlooks bonus points in its prediction; while the same prediction appears as a sub-graph in our DAG, CANTOR succeeds in filtering it out and recognizes the correct one.
  • Figure 4: Two test cases from MathQA. Operations leading to the same value are marked with the same color and letter (e.g., A, B, etc.). Purple ones are operations evaluating to the correct answer. For a clear presentation of our DAG, we only retain top-5 root vertices along with their descendants. We also present probabilities of predicted operators, operands, and root vertices. For predictions from DeductReasoner, we mark the decoding order of operations with circled numbers; operations with forward slashes in the background are erroneous ones.