Table of Contents
Fetching ...

EMODIS: A Benchmark for Context-Dependent Emoji Disambiguation in Large Language Models

Jiacheng Huang, Ning Yu, Xiaoyin Yi

TL;DR

EMODIS introduces a targeted benchmark to evaluate how large language models resolve emoji-related ambiguity in context-dependent sentences. By pairing target sentences with two minimal yet contrasting contexts and a disambiguation question, the dataset probes pragmatic reasoning and contextual sensitivity. Empirical results show humans achieve near-perfect accuracy, while even the strongest LLMs exhibit substantial gaps, with API models outperforming open-source ones and a systematic bias toward figurative emoji readings. The work highlights critical limitations in current semantic and pragmatic reasoning and offers a rigorous testbed to guide future improvements in context-aware language understanding. Overall, EMODIS underscores the gap between human and machine interpretation of symbolic language and provides actionable diagnostics for enhancing context-sensitive NLP systems.

Abstract

Large language models (LLMs) are increasingly deployed in real-world communication settings, yet their ability to resolve context-dependent ambiguity remains underexplored. In this work, we present EMODIS, a new benchmark for evaluating LLMs' capacity to interpret ambiguous emoji expressions under minimal but contrastive textual contexts. Each instance in EMODIS comprises an ambiguous sentence containing an emoji, two distinct disambiguating contexts that lead to divergent interpretations, and a specific question that requires contextual reasoning. We evaluate both open-source and API-based LLMs, and find that even the strongest models frequently fail to distinguish meanings when only subtle contextual cues are present. Further analysis reveals systematic biases toward dominant interpretations and limited sensitivity to pragmatic contrast. EMODIS provides a rigorous testbed for assessing contextual disambiguation, and highlights the gap in semantic reasoning between humans and LLMs.

EMODIS: A Benchmark for Context-Dependent Emoji Disambiguation in Large Language Models

TL;DR

EMODIS introduces a targeted benchmark to evaluate how large language models resolve emoji-related ambiguity in context-dependent sentences. By pairing target sentences with two minimal yet contrasting contexts and a disambiguation question, the dataset probes pragmatic reasoning and contextual sensitivity. Empirical results show humans achieve near-perfect accuracy, while even the strongest LLMs exhibit substantial gaps, with API models outperforming open-source ones and a systematic bias toward figurative emoji readings. The work highlights critical limitations in current semantic and pragmatic reasoning and offers a rigorous testbed to guide future improvements in context-aware language understanding. Overall, EMODIS underscores the gap between human and machine interpretation of symbolic language and provides actionable diagnostics for enhancing context-sensitive NLP systems.

Abstract

Large language models (LLMs) are increasingly deployed in real-world communication settings, yet their ability to resolve context-dependent ambiguity remains underexplored. In this work, we present EMODIS, a new benchmark for evaluating LLMs' capacity to interpret ambiguous emoji expressions under minimal but contrastive textual contexts. Each instance in EMODIS comprises an ambiguous sentence containing an emoji, two distinct disambiguating contexts that lead to divergent interpretations, and a specific question that requires contextual reasoning. We evaluate both open-source and API-based LLMs, and find that even the strongest models frequently fail to distinguish meanings when only subtle contextual cues are present. Further analysis reveals systematic biases toward dominant interpretations and limited sensitivity to pragmatic contrast. EMODIS provides a rigorous testbed for assessing contextual disambiguation, and highlights the gap in semantic reasoning between humans and LLMs.

Paper Structure

This paper contains 33 sections, 6 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: An illustration of our benchmark. Interpretation of sentence with emoji can be significantly influenced by contextual information.
  • Figure 2: Taxonomy of our EMODIS benchmark. For each category, we provide two representative cases. Each case consists of a target sentence, a question, and two contrasting contexts that lead to different answers. Questions (Q), contexts (C), and answers (A) are labeled accordingly to highlight the disambiguation task.
  • Figure 3: Distribution of context taxonomy (left) and emoji categories (right) in our EMODIS benchmark.