Table of Contents
Fetching ...

ExpressivityBench: Can LLMs Communicate Implicitly?

Joshua Tint, Som Sagar, Aditya Taparia, Kelly Raines, Bimsara Pathiraja, Caleb Liu, Ransalu Senanayake

TL;DR

ExpressivityBench introduces an information-theoretic framework to quantify LLM expressivity, i.e., the ability to implicitly convey signals like emotion, tone, and identity, via a channel-based evaluation using a blind grader. The approach computes mutual information $I(s; \hat{s})$ between true signals $s$ and grader guesses $\hat{s}$, with normalization $N = I(s; \hat{s}) / H(s)$ to compare against human baselines. Across nine tasks, the study finds strong model performance on affective and narrative signals but persistent gaps in sociolinguistic signals such as political slant, age, and gender, underscoring that expressivity alone does not equate to human-like communication. The work validates graders against human judgments, releases code and data, and discusses implications for applications requiring socially aware dialogue, while highlighting the need to manage potential hyper-expressivity and to extend evaluation to multilingual and culturally diverse contexts.

Abstract

Human communication is often implicit, conveying tone, identity, and intent beyond literal meanings. While large language models have achieved strong performance on explicit tasks such as summarization and reasoning, their capacity for expressivity, or implicit communication, remains underexplored. We introduce \textbf{ExpressivityBench}, a framework for evaluating the expressivity of LLMs using information-theoretic communication models. Our approach quantifies how well LLM-generated text communicates target properties without explicit mention, across nine tasks spanning emotion, identity, and tone. To enable scalable and reproducible evaluation, we employ LLM-based graders validated against human judgments. Our results reveal that while models are adept at expressing affective content, they struggle with sociolinguistic signals, lagging behind human baselines. This study provides a necessary step to evaluate human-like implicit communication, with implications for applications such as education, mental health support, and socially-aware dialogue systems. We provide code and data for our benchmark alongside our paper.

ExpressivityBench: Can LLMs Communicate Implicitly?

TL;DR

ExpressivityBench introduces an information-theoretic framework to quantify LLM expressivity, i.e., the ability to implicitly convey signals like emotion, tone, and identity, via a channel-based evaluation using a blind grader. The approach computes mutual information between true signals and grader guesses , with normalization to compare against human baselines. Across nine tasks, the study finds strong model performance on affective and narrative signals but persistent gaps in sociolinguistic signals such as political slant, age, and gender, underscoring that expressivity alone does not equate to human-like communication. The work validates graders against human judgments, releases code and data, and discusses implications for applications requiring socially aware dialogue, while highlighting the need to manage potential hyper-expressivity and to extend evaluation to multilingual and culturally diverse contexts.

Abstract

Human communication is often implicit, conveying tone, identity, and intent beyond literal meanings. While large language models have achieved strong performance on explicit tasks such as summarization and reasoning, their capacity for expressivity, or implicit communication, remains underexplored. We introduce \textbf{ExpressivityBench}, a framework for evaluating the expressivity of LLMs using information-theoretic communication models. Our approach quantifies how well LLM-generated text communicates target properties without explicit mention, across nine tasks spanning emotion, identity, and tone. To enable scalable and reproducible evaluation, we employ LLM-based graders validated against human judgments. Our results reveal that while models are adept at expressing affective content, they struggle with sociolinguistic signals, lagging behind human baselines. This study provides a necessary step to evaluate human-like implicit communication, with implications for applications such as education, mental health support, and socially-aware dialogue systems. We provide code and data for our benchmark alongside our paper.

Paper Structure

This paper contains 30 sections, 4 equations, 7 figures, 8 tables, 1 algorithm.

Figures (7)

  • Figure 1: ExpressivityBench tests LLMs on their ability to implicitly express information using an information theoretic channel method, measuring a generator's ability to faithfully convey implicit signal to a grader.
  • Figure 2: Raw mutual information scores $I(\hat{s}; s)$ for each model across different ExpressivityBench tasks.
  • Figure 3: Unfilled copy of the human study survey used in our evaluation.
  • Figure 4: An example of an LLM-LLM conversation.
  • Figure 5: A subset of the confusion matrix for GPT 3.5 on having a conversation over different emotions in experiment 3. We can see that most of the converstation defaulted to positive signals, mainly "Admiration" and "Gratitude."
  • ...and 2 more figures