Human vs. LMMs: Exploring the Discrepancy in Emoji Interpretation and Usage in Digital Communication
Hanjia Lyu, Weihong Qi, Zhongyu Wei, Jiebo Luo
TL;DR
This work investigates how GPT-4V interprets and uses emojis compared with humans in social-media contexts. It conducts two studies: first, comparing GPT-4V's single-word emoji descriptions to human annotations to assess interpretive alignment; second, prompting GPT-4V to generate three emojis for TikTok-like contexts and comparing usage patterns via embedding analyses. The results reveal category-dependent interpretive gaps and a mixture of human-like and divergent emoji usage, with notable biases likely stemming from English-centric training and cultural representation gaps. These findings highlight both the strengths and limitations of LMMs in navigating symbolic digital language and underscore the need for more culturally diverse data to improve emoji understanding in AI systems.
Abstract
Leveraging Large Multimodal Models (LMMs) to simulate human behaviors when processing multimodal information, especially in the context of social media, has garnered immense interest due to its broad potential and far-reaching implications. Emojis, as one of the most unique aspects of digital communication, are pivotal in enriching and often clarifying the emotional and tonal dimensions. Yet, there is a notable gap in understanding how these advanced models, such as GPT-4V, interpret and employ emojis in the nuanced context of online interaction. This study intends to bridge this gap by examining the behavior of GPT-4V in replicating human-like use of emojis. The findings reveal a discernible discrepancy between human and GPT-4V behaviors, likely due to the subjective nature of human interpretation and the limitations of GPT-4V's English-centric training, suggesting cultural biases and inadequate representation of non-English cultures.
