Penguins Don't Fly: Reasoning about Generics through Instantiations and Exceptions
Emily Allaway, Jena D. Hwang, Chandra Bhagavatula, Kathleen McKeown, Doug Downey, Yejin Choi
TL;DR
This work introduces a linguistically grounded framework to automatically generate exemplars for generic statements, producing instantiations and counterexamples (exemplars) to better capture how generics function in real-world reasoning. By formalizing three generic categories with corresponding logical forms, the authors derive templates and employ NeuroLogic$^{\star}$ constrained decoding to control generation, followed by viability and validity filtering with RoBERTa-based discriminators. Across ~653 generics, the method yields ~19k exemplars and outperforms GPT-3 by ~12.8 precision points in human evaluations, demonstrating improved controllability and quality, especially for exceptions. The study highlights the limitations of commonsense knowledge bases for exemplars, the necessity of linguistic-theory-guided decoding, and the current challenges of aligning exemplars with natural language inference, underscoring areas for future work in reasoning with defaults and counterexamples.
Abstract
Generics express generalizations about the world (e.g., birds can fly) that are not universally true (e.g., newborn birds and penguins cannot fly). Commonsense knowledge bases, used extensively in NLP, encode some generic knowledge but rarely enumerate such exceptions and knowing when a generic statement holds or does not hold true is crucial for developing a comprehensive understanding of generics. We present a novel framework informed by linguistic theory to generate exemplars -- specific cases when a generic holds true or false. We generate ~19k exemplars for ~650 generics and show that our framework outperforms a strong GPT-3 baseline by 12.8 precision points. Our analysis highlights the importance of linguistic theory-based controllability for generating exemplars, the insufficiency of knowledge bases as a source of exemplars, and the challenges exemplars pose for the task of natural language inference.
