Table of Contents
Fetching ...

WUGNECTIVES: Novel Entity Inferences of Language Models from Discourse Connectives

Daniel Brubaker, William Sheffield, Junyi Jessy Li, Kanishka Misra

TL;DR

This work investigates whether discourse connectives can inform language models about world knowledge by testing inferences about novel entities expressed through nonce words. The authors introduce Wugnectives, a benchmark comprising 740 utterances embedded across 12 prompt templates, yielding 8,880 stimuli that map connectives to four inference senses (Instantiation, Concession, Contingency, Temporal) and three stimulus types. Evaluating 17 LM families across scales and training regimens, they find that reasoning-based tuning yields robust gains, particularly for non-concessional inferences, while concession connectives remain systematically challenging and no consistent global effects from model size or instruction tuning emerge. The results highlight the nuanced role of language cues in semantic learning and motivate further study into reasoning-enabled models and broader connective-based evaluation. The dataset is released to the community under an MIT license to foster ongoing analysis of connective-driven world knowledge in LMs.

Abstract

The role of world knowledge has been particularly crucial to predict the discourse connective that marks the discourse relation between two arguments, with language models (LMs) being generally successful at this task. We flip this premise in our work, and instead study the inverse problem of understanding whether discourse connectives can inform LMs about the world. To this end, we present WUGNECTIVES, a dataset of 8,880 stimuli that evaluates LMs' inferences about novel entities in contexts where connectives link the entities to particular attributes. On investigating 17 different LMs at various scales, and training regimens, we found that tuning an LM to show reasoning behavior yields noteworthy improvements on most connectives. At the same time, there was a large variation in LMs' overall performance across connective type, with all models systematically struggling on connectives that express a concessive meaning. Our findings pave the way for more nuanced investigations into the functional role of language cues as captured by LMs. We release WUGNECTIVES at https://github.com/sheffwb/wugnectives.

WUGNECTIVES: Novel Entity Inferences of Language Models from Discourse Connectives

TL;DR

This work investigates whether discourse connectives can inform language models about world knowledge by testing inferences about novel entities expressed through nonce words. The authors introduce Wugnectives, a benchmark comprising 740 utterances embedded across 12 prompt templates, yielding 8,880 stimuli that map connectives to four inference senses (Instantiation, Concession, Contingency, Temporal) and three stimulus types. Evaluating 17 LM families across scales and training regimens, they find that reasoning-based tuning yields robust gains, particularly for non-concessional inferences, while concession connectives remain systematically challenging and no consistent global effects from model size or instruction tuning emerge. The results highlight the nuanced role of language cues in semantic learning and motivate further study into reasoning-enabled models and broader connective-based evaluation. The dataset is released to the community under an MIT license to foster ongoing analysis of connective-driven world knowledge in LMs.

Abstract

The role of world knowledge has been particularly crucial to predict the discourse connective that marks the discourse relation between two arguments, with language models (LMs) being generally successful at this task. We flip this premise in our work, and instead study the inverse problem of understanding whether discourse connectives can inform LMs about the world. To this end, we present WUGNECTIVES, a dataset of 8,880 stimuli that evaluates LMs' inferences about novel entities in contexts where connectives link the entities to particular attributes. On investigating 17 different LMs at various scales, and training regimens, we found that tuning an LM to show reasoning behavior yields noteworthy improvements on most connectives. At the same time, there was a large variation in LMs' overall performance across connective type, with all models systematically struggling on connectives that express a concessive meaning. Our findings pave the way for more nuanced investigations into the functional role of language cues as captured by LMs. We release WUGNECTIVES at https://github.com/sheffwb/wugnectives.

Paper Structure

This paper contains 43 sections, 3 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: Past work has largely focused on the prediction of connectives given some input context, usually requiring access to world knowledge. We reduce this reliance by using novel entities, and analyze whether LMs can rely on their knowledge of the connectives themselves to make inferences about the world.
  • Figure 2: Accuracy of LMs across connective senses. The black dashed line indicates chance performance (50%). Error bars indicate 95% confidence intervals measured across connectives and prompt variation.
  • Figure 3: Mean accuracy (across prompts) of models by connective for the sense Expansion.Instantiation.
  • Figure 4: Accuracy of top five models on succession stimuli without even though compared with their performance on even though. Model names are abbreviated to save space. Q: "Qwen", I: "Instruct".
  • Figure 5: Results on Comparison connectives.
  • ...and 2 more figures