Entailed Between the Lines: Incorporating Implication into NLI
Shreya Havaldar, Hamidreza Alvari, John Palowitch, Mohammad Javad Hosseini, Senaka Buthpitiya, Alex Fabrikant
TL;DR
This paper argues that human communication relies heavily on implicit meaning that standard NLI datasets fail to capture. It formalizes implied entailment by distinguishing explicit and implicit entailment, and introduces the Implied NLI (INLI) dataset (10k premises, 40k hypotheses) generated via a two-stage pipeline that augments implicature frames and creates alternative hypotheses, followed by thorough human validation. Empirical results show current NLI benchmarks underrepresent implied entailments, and models trained on INLI better recognize implied entailments, with strong generalization across domains and datasets. The work demonstrates that fine-tuning on INLI improves models' ability to read between the lines, with implications for real-world tasks like translation, summarization, and content evaluation.
Abstract
Much of human communication depends on implication, conveying meaning beyond literal words to express a wider range of thoughts, intentions, and feelings. For models to better understand and facilitate human communication, they must be responsive to the text's implicit meaning. We focus on Natural Language Inference (NLI), a core tool for many language tasks, and find that state-of-the-art NLI models and datasets struggle to recognize a range of cases where entailment is implied, rather than explicit from the text. We formalize implied entailment as an extension of the NLI task and introduce the Implied NLI dataset (INLI) to help today's LLMs both recognize a broader variety of implied entailments and to distinguish between implicit and explicit entailment. We show how LLMs fine-tuned on INLI understand implied entailment and can generalize this understanding across datasets and domains.
