Table of Contents
Fetching ...

Semantics Preserving Emoji Recommendation with Large Language Models

Zhongyi Qiu, Kangyi Qiu, Hanjia Lyu, Wei Xiong, Jiebo Luo

TL;DR

The paper tackles the problem of emoji recommendation by shifting from exact-matching ground-truth emojis to semantics-preserving guidance, ensuring the predicted emojis maintain the original text's affective state and demographic cues. It introduces a benchmark built on PAN18, a downstream-task-based evaluation pipeline, and multiple prompting strategies to assess large language models' ability to produce semantically consistent emoji sets. Key findings show GPT-4o delivers the strongest semantics-preservation performance, with up to 79.23% avg accuracy in zero-shot settings, while conditioning on user demographics significantly boosts results for several models. The work highlights essential considerations for bias and diversity in emoji usage, and it proposes directions like multilingual expansion and debiasing approaches to improve real-world applicability.

Abstract

Emojis have become an integral part of digital communication, enriching text by conveying emotions, tone, and intent. Existing emoji recommendation methods are primarily evaluated based on their ability to match the exact emoji a user chooses in the original text. However, they ignore the essence of users' behavior on social media in that each text can correspond to multiple reasonable emojis. To better assess a model's ability to align with such real-world emoji usage, we propose a new semantics preserving evaluation framework for emoji recommendation, which measures a model's ability to recommend emojis that maintain the semantic consistency with the user's text. To evaluate how well a model preserves semantics, we assess whether the predicted affective state, demographic profile, and attitudinal stance of the user remain unchanged. If these attributes are preserved, we consider the recommended emojis to have maintained the original semantics. The advanced abilities of Large Language Models (LLMs) in understanding and generating nuanced, contextually relevant output make them well-suited for handling the complexities of semantics preserving emoji recommendation. To this end, we construct a comprehensive benchmark to systematically assess the performance of six proprietary and open-source LLMs using different prompting techniques on our task. Our experiments demonstrate that GPT-4o outperforms other LLMs, achieving a semantics preservation score of 79.23%. Additionally, we conduct case studies to analyze model biases in downstream classification tasks and evaluate the diversity of the recommended emojis.

Semantics Preserving Emoji Recommendation with Large Language Models

TL;DR

The paper tackles the problem of emoji recommendation by shifting from exact-matching ground-truth emojis to semantics-preserving guidance, ensuring the predicted emojis maintain the original text's affective state and demographic cues. It introduces a benchmark built on PAN18, a downstream-task-based evaluation pipeline, and multiple prompting strategies to assess large language models' ability to produce semantically consistent emoji sets. Key findings show GPT-4o delivers the strongest semantics-preservation performance, with up to 79.23% avg accuracy in zero-shot settings, while conditioning on user demographics significantly boosts results for several models. The work highlights essential considerations for bias and diversity in emoji usage, and it proposes directions like multilingual expansion and debiasing approaches to improve real-world applicability.

Abstract

Emojis have become an integral part of digital communication, enriching text by conveying emotions, tone, and intent. Existing emoji recommendation methods are primarily evaluated based on their ability to match the exact emoji a user chooses in the original text. However, they ignore the essence of users' behavior on social media in that each text can correspond to multiple reasonable emojis. To better assess a model's ability to align with such real-world emoji usage, we propose a new semantics preserving evaluation framework for emoji recommendation, which measures a model's ability to recommend emojis that maintain the semantic consistency with the user's text. To evaluate how well a model preserves semantics, we assess whether the predicted affective state, demographic profile, and attitudinal stance of the user remain unchanged. If these attributes are preserved, we consider the recommended emojis to have maintained the original semantics. The advanced abilities of Large Language Models (LLMs) in understanding and generating nuanced, contextually relevant output make them well-suited for handling the complexities of semantics preserving emoji recommendation. To this end, we construct a comprehensive benchmark to systematically assess the performance of six proprietary and open-source LLMs using different prompting techniques on our task. Our experiments demonstrate that GPT-4o outperforms other LLMs, achieving a semantics preservation score of 79.23%. Additionally, we conduct case studies to analyze model biases in downstream classification tasks and evaluate the diversity of the recommended emojis.
Paper Structure (25 sections, 4 figures, 8 tables)

This paper contains 25 sections, 4 figures, 8 tables.

Figures (4)

  • Figure 1: Comparison of traditional exact match and semantics preservation approaches for emoji recommendation. The semantics preservation approach can suggest multiple emojis that maintain the semantic meaning of the text, even if they differ from the ground truth emojis.
  • Figure 2: Overview of the Semantics Preserving Emoji Recommendation Framework. Left side: The Emoji Recommendation Process uses large language models to recommend three emojis for texts from the benchmark dataset. Right side: The Semantics Preserving Evaluation Process compares text + predicted emojis with text + ground truth emojis across 5 selected semantic dimensions, including sentiment, emotion, stance, age, and gender.
  • Figure 3: Comparison of data distribution across five downstream tasks in the benchmark dataset before and after balancing.
  • Figure 4: Distribution of Top 50 Frequently Used Emojis Across Different Models