Table of Contents
Fetching ...

Reflection of Thought: Inversely Eliciting Numerical Reasoning in Language Models via Solving Linear Systems

Fan Zhou, Haoyu Dong, Qian Liu, Zhoujun Cheng, Shi Han, Dongmei Zhang

TL;DR

This work tackles unreliable numerical reasoning in language models by introducing reflection of thought (SoLiS), a training-free approach that probes simple anchor numbers to inversely elicit hidden arithmetic relationships. The core idea is to substitute complex operands with anchors, observe LM outputs, and formulate the deduction of the underlying arithmetic as a solvable linear system, with three solving strategies (analytical, search-based, heuristic) to handle noise. Experiments on DROP, AddSub, and MultiArith show consistent, significant improvements across a range of backbones and prompting settings, highlighting robustness and portability. The method yields interpretable intermediate expressions and extends LM numerical reasoning without additional training, suggesting broad applicability for real-world numerical reasoning tasks.

Abstract

Numerical reasoning over natural language has been a long-standing goal for the research community. However, cutting-edge language models have proven difficult to reliably generalize to a broad range of numbers, although they have shown proficiency in reasoning over common and simple numbers. In this paper, we propose a novel method to elicit and exploit the numerical reasoning knowledge hidden in pre-trained language models using simple anchor numbers. Concretely, we first leverage simple numbers as anchors to probe the implicitly inferred arithmetic expressions from language models, and then explicitly apply the expressions on complex numbers to get corresponding answers. To inversely elicit arithmetic expressions, we transform and formulate the task as an analytically solvable linear system. Experimental results on several numerical reasoning benchmarks demonstrate that our approach significantly improves numerical reasoning capabilities of existing LMs. More importantly, our approach is training-free and simply works in the inference phase, making it highly portable and achieving consistent performance benefits across a variety of language models (GPT-3, T5, BART, etc) in all zero-shot, few-shot, and fine-tuning scenarios.

Reflection of Thought: Inversely Eliciting Numerical Reasoning in Language Models via Solving Linear Systems

TL;DR

This work tackles unreliable numerical reasoning in language models by introducing reflection of thought (SoLiS), a training-free approach that probes simple anchor numbers to inversely elicit hidden arithmetic relationships. The core idea is to substitute complex operands with anchors, observe LM outputs, and formulate the deduction of the underlying arithmetic as a solvable linear system, with three solving strategies (analytical, search-based, heuristic) to handle noise. Experiments on DROP, AddSub, and MultiArith show consistent, significant improvements across a range of backbones and prompting settings, highlighting robustness and portability. The method yields interpretable intermediate expressions and extends LM numerical reasoning without additional training, suggesting broad applicability for real-world numerical reasoning tasks.

Abstract

Numerical reasoning over natural language has been a long-standing goal for the research community. However, cutting-edge language models have proven difficult to reliably generalize to a broad range of numbers, although they have shown proficiency in reasoning over common and simple numbers. In this paper, we propose a novel method to elicit and exploit the numerical reasoning knowledge hidden in pre-trained language models using simple anchor numbers. Concretely, we first leverage simple numbers as anchors to probe the implicitly inferred arithmetic expressions from language models, and then explicitly apply the expressions on complex numbers to get corresponding answers. To inversely elicit arithmetic expressions, we transform and formulate the task as an analytically solvable linear system. Experimental results on several numerical reasoning benchmarks demonstrate that our approach significantly improves numerical reasoning capabilities of existing LMs. More importantly, our approach is training-free and simply works in the inference phase, making it highly portable and achieving consistent performance benefits across a variety of language models (GPT-3, T5, BART, etc) in all zero-shot, few-shot, and fine-tuning scenarios.
Paper Structure (26 sections, 1 equation, 8 figures, 11 tables, 1 algorithm)

This paper contains 26 sections, 1 equation, 8 figures, 11 tables, 1 algorithm.

Figures (8)

  • Figure 1: The illustration of our proposed framework, which elicits numerical reasoning in language models via Solving Linear Systems (SoLiS).
  • Figure 2: Performance with different floating point precision (left) and integer range (right).
  • Figure 3: The experimental results of Chain and Chain w.SoLiS on AddSub as the number of few-shot examples decreases.
  • Figure 4: The experimental results of SoLiS on MathExp with different choices of anchor number range (left) and anchor number groups (right).
  • Figure 5: Performance over different floating point precision (left) and integer range (right) on MathExp of GPT-3 w. search-based algorithm.
  • ...and 3 more figures