Multi-Step Deductive Reasoning Over Natural Language: An Empirical Study on Out-of-Distribution Generalisation

Qiming Bao; Alex Yuxuan Peng; Tim Hartill; Neset Tan; Zhenyun Deng; Michael Witbrock; Jiamou Liu

Multi-Step Deductive Reasoning Over Natural Language: An Empirical Study on Out-of-Distribution Generalisation

Qiming Bao, Alex Yuxuan Peng, Tim Hartill, Neset Tan, Zhenyun Deng, Michael Witbrock, Jiamou Liu

TL;DR

The paper tackles end-to-end multi-step deductive reasoning over natural language by introducing IMA-GloVe-GA, an iterative memory network with gated attention that extends DeepLogic to word-level NL inputs. It evaluates on PARARULES, CONCEPTRULES V1/V2, and introduces PARARULE-Plus to balance reasoning depths, demonstrating improved performance and notably better out-of-distribution generalization when rules are shuffled. The approach achieves competitive or superior results to RoBERTa-Large in several settings, highlighting the value of gated attention and iterative inference for NL reasoning. By releasing PARARULE-Plus and code, the work advances neural-symbolic integration and provides a resource to study reasoning depth in NL tasks.

Abstract

Combining deep learning with symbolic logic reasoning aims to capitalize on the success of both fields and is drawing increasing attention. Inspired by DeepLogic, an end-to-end model trained to perform inference on logic programs, we introduce IMA-GloVe-GA, an iterative neural inference network for multi-step reasoning expressed in natural language. In our model, reasoning is performed using an iterative memory neural network based on RNN with a gated attention mechanism. We evaluate IMA-GloVe-GA on three datasets: PARARULES, CONCEPTRULES V1 and CONCEPTRULES V2. Experimental results show DeepLogic with gated attention can achieve higher test accuracy than DeepLogic and other RNN baseline models. Our model achieves better out-of-distribution generalisation than RoBERTa-Large when the rules have been shuffled. Furthermore, to address the issue of unbalanced distribution of reasoning depths in the current multi-step reasoning datasets, we develop PARARULE-Plus, a large dataset with more examples that require deeper reasoning steps. Experimental results show that the addition of PARARULE-Plus can increase the model's performance on examples requiring deeper reasoning depths. The source code and data are available at https://github.com/Strong-AI-Lab/Multi-Step-Deductive-Reasoning-Over-Natural-Language.

Multi-Step Deductive Reasoning Over Natural Language: An Empirical Study on Out-of-Distribution Generalisation

TL;DR

Abstract

Paper Structure (8 sections, 7 equations, 3 figures, 8 tables, 1 algorithm)

This paper contains 8 sections, 7 equations, 3 figures, 8 tables, 1 algorithm.

Introduction
Related Work
Problem Definition
Method
The Datasets
Experiments
Conclusion
Appendix

Figures (3)

Figure 1: A depth-2 example from HotpotQA yang2018hotpotqa.
Figure 2: Examples from PARARULES clark2020transformers. The context (facts + rules) and the question are grouped as the input, and the output is a Boolean value indicating if the question is true or false, given the context.
Figure 3: The iterative neural cell with word-level embeddings as input. Questions and contexts are represented as word embeddings, and then attentions are computed to pick up related rules. Gated attention is used to compute the weighted sum of the Unifier GRU outputs. Then the weighted sum updates the state for the next iteration.

Multi-Step Deductive Reasoning Over Natural Language: An Empirical Study on Out-of-Distribution Generalisation

TL;DR

Abstract

Multi-Step Deductive Reasoning Over Natural Language: An Empirical Study on Out-of-Distribution Generalisation

Authors

TL;DR

Abstract

Table of Contents

Figures (3)