Enhancing Systematic Decompositional Natural Language Inference Using Informal Logic
Nathaniel Weir, Kate Sanders, Orion Weller, Shreya Sharma, Dongwei Jiang, Zhengping Jiang, Bhavana Dalvi Mishra, Oyvind Tafjord, Peter Jansen, Peter Clark, Benjamin Van Durme
TL;DR
This work introduces RDTE, a principled, informal-logic–driven protocol for annotating decompositional textual entailment, addressing inconsistencies in prior datasets and the need for reliable reasoning steps in entailment trees. It presents a high-quality RDTE dataset and a knowledge-distillation pipeline that leverages GPT-4 to generate silver RDTE annotations, enabling smaller models to achieve strong precision in decomposition validation. Building on this, the authors introduce TreeWise, an entailment-tree engine that integrates backward chaining with forward inference, diverse prompts, and RDTE-based verification to ground hypotheses in verified corpora such as Wikipedia. Across ARC and HotpotQA, TreeWise with RDTE distillation yields superior QA accuracy and higher-tree integrity, highlighting the practical impact of rigorous reasoning protocols for trustworthy NL inference.
Abstract
Recent language models enable new opportunities for structured reasoning with text, such as the construction of intuitive, proof-like textual entailment trees without relying on brittle formal logic. However, progress in this direction has been hampered by a long-standing lack of a clear protocol for determining what valid compositional entailment is. This absence causes noisy datasets and limited performance gains by modern neuro-symbolic engines. To address these problems, we formulate a consistent and theoretically grounded approach to annotating decompositional entailment and evaluate its impact on LLM-based textual inference. We find that our new dataset, RDTE (Recognizing Decompositional Textual Entailment), has a substantially higher internal consistency (+9%) than prior decompositional entailment datasets. We also find that training an RDTE-oriented entailment classifier via knowledge distillation and employing it in an entailment tree reasoning engine significantly improves both accuracy and proof quality, illustrating the practical benefit of this advance for textual inference.
