WUGNECTIVES: Novel Entity Inferences of Language Models from Discourse Connectives
Daniel Brubaker, William Sheffield, Junyi Jessy Li, Kanishka Misra
TL;DR
This work investigates whether discourse connectives can inform language models about world knowledge by testing inferences about novel entities expressed through nonce words. The authors introduce Wugnectives, a benchmark comprising 740 utterances embedded across 12 prompt templates, yielding 8,880 stimuli that map connectives to four inference senses (Instantiation, Concession, Contingency, Temporal) and three stimulus types. Evaluating 17 LM families across scales and training regimens, they find that reasoning-based tuning yields robust gains, particularly for non-concessional inferences, while concession connectives remain systematically challenging and no consistent global effects from model size or instruction tuning emerge. The results highlight the nuanced role of language cues in semantic learning and motivate further study into reasoning-enabled models and broader connective-based evaluation. The dataset is released to the community under an MIT license to foster ongoing analysis of connective-driven world knowledge in LMs.
Abstract
The role of world knowledge has been particularly crucial to predict the discourse connective that marks the discourse relation between two arguments, with language models (LMs) being generally successful at this task. We flip this premise in our work, and instead study the inverse problem of understanding whether discourse connectives can inform LMs about the world. To this end, we present WUGNECTIVES, a dataset of 8,880 stimuli that evaluates LMs' inferences about novel entities in contexts where connectives link the entities to particular attributes. On investigating 17 different LMs at various scales, and training regimens, we found that tuning an LM to show reasoning behavior yields noteworthy improvements on most connectives. At the same time, there was a large variation in LMs' overall performance across connective type, with all models systematically struggling on connectives that express a concessive meaning. Our findings pave the way for more nuanced investigations into the functional role of language cues as captured by LMs. We release WUGNECTIVES at https://github.com/sheffwb/wugnectives.
