Identification of Entailment and Contradiction Relations between Natural Language Sentences: A Neurosymbolic Approach
Xuyao Feng, Anthony Hunter
TL;DR
This work tackles the lack of explainability in natural language inference by proposing a neurosymbolic pipeline that translates sentences into Abstract Meaning Representation graphs, then into propositional logic, and finally uses SAT-based reasoning to determine entailment, contradiction, or neutrality. It introduces relaxation via neuro-matching and a forgetting mechanism to handle lexical variation and commonsense knowledge, enabling more robust entailment and contradiction detection. Evaluations on e-SNLI, SICK, and MultiNLI demonstrate that relaxation improves recall for entailment and contradiction and that explanations can boost performance on e-SNLI, though neutral detection remains challenging. The approach provides explicit, verifiable reasoning steps and a framework to incorporate commonsense via embedding-based relaxation, contributing to more transparent NLP inference systems.
Abstract
Natural language inference (NLI), also known as Recognizing Textual Entailment (RTE), is an important aspect of natural language understanding. Most research now uses machine learning and deep learning to perform this task on specific datasets, meaning their solution is not explainable nor explicit. To address the need for an explainable approach to RTE, we propose a novel pipeline that is based on translating text into an Abstract Meaning Representation (AMR) graph. For this we use a pre-trained AMR parser. We then translate the AMR graph into propositional logic and use a SAT solver for automated reasoning. In text, often commonsense suggests that an entailment (or contradiction) relationship holds between a premise and a claim, but because different wordings are used, this is not identified from their logical representations. To address this, we introduce relaxation methods to allow replacement or forgetting of some propositions. Our experimental results show this pipeline performs well on four RTE datasets.
