Do language models capture implied discourse meanings? An investigation with exhaustivity implicatures of Korean morphology
Hagyeong Shin, Sean Trott
TL;DR
The study investigates whether distributional semantics in large language models can encode discourse-level meanings associated with Korean Differential Object Marking (DOM), focusing on lul, nun, and null-marking. It conducts processing and production experiments across several models (KoGPT variants, Polyglot-Ko, GPT-3, ChatGPT), using surprisal, ratings, log-probabilities, and forced-choice tasks, analyzed with mixed-effects models. Findings show that some large models (notably GPT-3 and Polyglot-Ko-12B) exhibit partial sensitivity to exhaustivity implicatures, especially for the nun marker, but encoding dual meanings across markers remains inconsistent and challenging, with lul less likely to encode discourse meaning. The results suggest that distributional semantics alone provide only a baseline for discourse pragmatics in DID Korean DOM, and improvements via scaling and human feedback hint at potential but do not fully replicate human-like discourse interpretation.
Abstract
Markedness in natural language is often associated with non-literal meanings in discourse. Differential Object Marking (DOM) in Korean is one instance of this phenomenon, where post-positional markers are selected based on both the semantic features of the noun phrases and the discourse features that are orthogonal to the semantic features. Previous work has shown that distributional models of language recover certain semantic features of words -- do these models capture implied discourse-level meanings as well? We evaluate whether a set of large language models are capable of associating discourse meanings with different object markings in Korean. Results suggest that discourse meanings of a grammatical marker can be more challenging to encode than that of a discourse marker.
