Reasoning in Transformers -- Mitigating Spurious Correlations and Reasoning Shortcuts

Daniel Enström; Viktor Kjellberg; Moa Johansson

Reasoning in Transformers -- Mitigating Spurious Correlations and Reasoning Shortcuts

Daniel Enström, Viktor Kjellberg, Moa Johansson

TL;DR

This work probes whether transformers truly learn deductive reasoning in propositional logic or merely exploit spurious data patterns. It compares a full-proof generator (WP-BART) with a neuro-symbolic, stepwise approach (SIP-BART) using an augmented SimpleLogicPS dataset that embeds proofs. SIP-BART substantially reduces reliance on spurious correlations, achieving over 99.8% accuracy across test sets and delineating four residual consistency errors, while WP-BART underperforms and retains shortcuts. The findings argue for neuro-symbolic architectures or constrained generation to achieve robust reasoning in language models, with practical implications for building trustworthy reasoning systems.

Abstract

Transformer language models are neural networks used for a wide variety of tasks concerning natural language, including some that also require logical reasoning. However, a transformer model may easily learn spurious patterns in the data, short-circuiting actual reasoning. In this paper we investigate to what extent transformers can be trained to a) approximate reasoning in propositional logic while b) avoiding known reasoning shortcuts via spurious correlations in the training data. To do so, we use a dataset with known spurious correlation between truth and e.g. the number of rules in the problem. We augment the data with proofs, and train two models: a generative transformer, WP-BART, trained on problems and their whole proofs, and a neuro-symbolic model, SIP-BART, trained on individual proof steps and combining the generative transformer model BART with a symbolic proof checker. We find that SIP-BART succeeds in avoiding reasoning shortcuts, while WP-BART does not. For SIP-BART, we then identify a few remaining reasoning errors, not previously described in the literature, arising from using a pre-trained language model. These are qualitatively analysed to create a taxonomy of four different types of additional pitfalls.

Reasoning in Transformers -- Mitigating Spurious Correlations and Reasoning Shortcuts

TL;DR

Abstract

Paper Structure (28 sections, 6 figures, 8 tables)

This paper contains 28 sections, 6 figures, 8 tables.

Introduction
Method
Data
Model Design and Training
Whole-Proof BART (WP-BART).
Symbolic Iterative Proof-BART (SIP-BART).
Training.
Evaluation
Results
Accuracy of Truth Values
Accuracy of WP-BART.
Accuracy of SIP-BART.
Consistency of SIP-BART
Non-existing Rule
Inapplicable Rule
...and 13 more sections

Figures (6)

Figure 1: Example of how the generation procedure works. The three blue boxes represent three instances of input strings in SimpleLogicPS and the white boxes represent their respective output strings. The neural model thus regard each step of the proof as an isolated problem to be solved, since each step is a separate training instance. The complete proof string created by the model when inference is finished for a given problem is represented by the purple box.
Figure 2: An overview of the SIP-BART architecture using an example of data-flow during inference.
Figure 3: Two examples of where the Non-existing Rule error has occurred. On the left, the generated rule include a synonym word "courageous" instead of "fearless". The second example (right) the model generated part of the rule but missed the premise "charming".
Figure 4: Two examples of where the generated inference steps include inapplicable rules. In both examples the conclusion of the last rule has been mistaken as a fact. The model confuses the conclusion of the last rule in the input string, regarding it as a fact instead of a rule that is not yet satisfied.
Figure 5: Two examples of where a Spurious Match has occurred. In the first example (left) the fact glamorous has been mistaken as the query gorgeous. In the second example (right) the conclusion in the generated rule aggressive,attentive $\Rightarrow$ adorable is mistaken as the query cute.
...and 1 more figures

Reasoning in Transformers -- Mitigating Spurious Correlations and Reasoning Shortcuts

TL;DR

Abstract

Reasoning in Transformers -- Mitigating Spurious Correlations and Reasoning Shortcuts

Authors

TL;DR

Abstract

Table of Contents

Figures (6)