Are Large Language Models Chronically Online Surfers? A Dataset for Chinese Internet Meme Explanation
Yubo Xie, Chenkai Wang, Zongyang Ma, Fahui Miao
TL;DR
This study introduces CHIME, a dataset of simplified Chinese phrase-based memes annotated with meaning, origin, and usage to evaluate large language models' meme understanding. The authors design two tasks—explanation of meaning/origin and an MCQ-based meme selection task—to probe both receptive and productive meme literacy in multiple LLMs under zero-shot (with limited one-shot) prompting. Automatic and human evaluations reveal that LLMs struggle with linguistically nuanced memes and accurate origins, though larger models perform better and contextual meaning can boost MCQ accuracy. The work provides a public benchmark, demonstrates notable gaps between machine and human meme understanding, and points to future directions, including multimodal meme handling and culturally grounded model development.
Abstract
Large language models (LLMs) are trained on vast amounts of text from the Internet, but do they truly understand the viral content that rapidly spreads online -- commonly known as memes? In this paper, we introduce CHIME, a dataset for CHinese Internet Meme Explanation. The dataset comprises popular phrase-based memes from the Chinese Internet, annotated with detailed information on their meaning, origin, example sentences, types, etc. To evaluate whether LLMs understand these memes, we designed two tasks. In the first task, we assessed the models' ability to explain a given meme, identify its origin, and generate appropriate example sentences. The results show that while LLMs can explain the meanings of some memes, their performance declines significantly for culturally and linguistically nuanced meme types. Additionally, they consistently struggle to provide accurate origins for the memes. In the second task, we created a set of multiple-choice questions (MCQs) requiring LLMs to select the most appropriate meme to fill in a blank within a contextual sentence. While the evaluated models were able to provide correct answers, their performance remains noticeably below human levels. We have made CHIME public and hope it will facilitate future research on computational meme understanding.
