Irony in Emojis: A Comparative Study of Human and LLM Interpretation
Yawen Zheng, Hanjia Lyu, Jiebo Luo
TL;DR
This paper compares GPT-4o's interpretation of irony in emojis to human perception using the Ciron Weibo dataset, examining how age and gender prompts affect irony judgments. Humans rate irony on posts containing emojis, while GPT-4o rates the likelihood that an emoji expresses irony, enabling a direct comparison via statistical tests. The results show GPT-4o generally assigns higher irony potential than humans, with only modest alignment and clear age effects, highlighting model biases and contextual dependencies. The work underscores the need for cross-cultural and demographic-aware evaluation of emoji interpretation in AI systems for sentiment analysis and conversational agents.
Abstract
Emojis have become a universal language in online communication, often carrying nuanced and context-dependent meanings. Among these, irony poses a significant challenge for Large Language Models (LLMs) due to its inherent incongruity between appearance and intent. This study examines the ability of GPT-4o to interpret irony in emojis. By prompting GPT-4o to evaluate the likelihood of specific emojis being used to express irony on social media and comparing its interpretations with human perceptions, we aim to bridge the gap between machine and human understanding. Our findings reveal nuanced insights into GPT-4o's interpretive capabilities, highlighting areas of alignment with and divergence from human behavior. Additionally, this research underscores the importance of demographic factors, such as age and gender, in shaping emoji interpretation and evaluates how these factors influence GPT-4o's performance.
