Are LLMs good pragmatic speakers?
Mingyue Jian, N. Siddharth
TL;DR
This work investigates whether vanilla LLMs exhibit pragmatic speaker behavior within the Rational Speech Act framework by evaluating a TUNA-based reference-game task. It contrasts two meaning-function variants for RSA scoring—prompt-based and rule-based—against a vanilla LLM (Llama3-8B-Instruct) across top-k and logic-derived utterances, using Pearson and Spearman correlations to quantify alignment. The findings show positive but inconclusive correlations between LLM scores and RSA predictions, with stronger alignment when the RSA uses a rule-based MF for logic constructs and weaker alignment for top-k pragmatic sequences, suggesting that current LLMs do not robustly behave as pragmatic speakers in this setting. The study highlights the need for further work, including human-subject experiments, evaluation of additional models, iterated RSA analyses, and broader domains, to clarify under what conditions LLMs can approximate pragmatic speaker behavior.
Abstract
Large language models (LLMs) are trained on data assumed to include natural language pragmatics, but do they actually behave like pragmatic speakers? We attempt to answer this question using the Rational Speech Act (RSA) framework, which models pragmatic reasoning in human communication. Using the paradigm of a reference game constructed from the TUNA corpus, we score candidate referential utterances in both a state-of-the-art LLM (Llama3-8B-Instruct) and in the RSA model, comparing and contrasting these scores. Given that RSA requires defining alternative utterances and a truth-conditional meaning function, we explore such comparison for different choices of each of these requirements. We find that while scores from the LLM have some positive correlation with those from RSA, there isn't sufficient evidence to claim that it behaves like a pragmatic speaker. This initial study paves way for further targeted efforts exploring different models and settings, including human-subject evaluation, to see if LLMs truly can, or be made to, behave like pragmatic speakers.
