Table of Contents
Fetching ...

Are Large Language Models Chronically Online Surfers? A Dataset for Chinese Internet Meme Explanation

Yubo Xie, Chenkai Wang, Zongyang Ma, Fahui Miao

TL;DR

This study introduces CHIME, a dataset of simplified Chinese phrase-based memes annotated with meaning, origin, and usage to evaluate large language models' meme understanding. The authors design two tasks—explanation of meaning/origin and an MCQ-based meme selection task—to probe both receptive and productive meme literacy in multiple LLMs under zero-shot (with limited one-shot) prompting. Automatic and human evaluations reveal that LLMs struggle with linguistically nuanced memes and accurate origins, though larger models perform better and contextual meaning can boost MCQ accuracy. The work provides a public benchmark, demonstrates notable gaps between machine and human meme understanding, and points to future directions, including multimodal meme handling and culturally grounded model development.

Abstract

Large language models (LLMs) are trained on vast amounts of text from the Internet, but do they truly understand the viral content that rapidly spreads online -- commonly known as memes? In this paper, we introduce CHIME, a dataset for CHinese Internet Meme Explanation. The dataset comprises popular phrase-based memes from the Chinese Internet, annotated with detailed information on their meaning, origin, example sentences, types, etc. To evaluate whether LLMs understand these memes, we designed two tasks. In the first task, we assessed the models' ability to explain a given meme, identify its origin, and generate appropriate example sentences. The results show that while LLMs can explain the meanings of some memes, their performance declines significantly for culturally and linguistically nuanced meme types. Additionally, they consistently struggle to provide accurate origins for the memes. In the second task, we created a set of multiple-choice questions (MCQs) requiring LLMs to select the most appropriate meme to fill in a blank within a contextual sentence. While the evaluated models were able to provide correct answers, their performance remains noticeably below human levels. We have made CHIME public and hope it will facilitate future research on computational meme understanding.

Are Large Language Models Chronically Online Surfers? A Dataset for Chinese Internet Meme Explanation

TL;DR

This study introduces CHIME, a dataset of simplified Chinese phrase-based memes annotated with meaning, origin, and usage to evaluate large language models' meme understanding. The authors design two tasks—explanation of meaning/origin and an MCQ-based meme selection task—to probe both receptive and productive meme literacy in multiple LLMs under zero-shot (with limited one-shot) prompting. Automatic and human evaluations reveal that LLMs struggle with linguistically nuanced memes and accurate origins, though larger models perform better and contextual meaning can boost MCQ accuracy. The work provides a public benchmark, demonstrates notable gaps between machine and human meme understanding, and points to future directions, including multimodal meme handling and culturally grounded model development.

Abstract

Large language models (LLMs) are trained on vast amounts of text from the Internet, but do they truly understand the viral content that rapidly spreads online -- commonly known as memes? In this paper, we introduce CHIME, a dataset for CHinese Internet Meme Explanation. The dataset comprises popular phrase-based memes from the Chinese Internet, annotated with detailed information on their meaning, origin, example sentences, types, etc. To evaluate whether LLMs understand these memes, we designed two tasks. In the first task, we assessed the models' ability to explain a given meme, identify its origin, and generate appropriate example sentences. The results show that while LLMs can explain the meanings of some memes, their performance declines significantly for culturally and linguistically nuanced meme types. Additionally, they consistently struggle to provide accurate origins for the memes. In the second task, we created a set of multiple-choice questions (MCQs) requiring LLMs to select the most appropriate meme to fill in a blank within a contextual sentence. While the evaluated models were able to provide correct answers, their performance remains noticeably below human levels. We have made CHIME public and hope it will facilitate future research on computational meme understanding.

Paper Structure

This paper contains 42 sections, 3 figures, 13 tables.

Figures (3)

  • Figure 1: An example from our CHIME dataset.
  • Figure 2: Average cosine similarity and BERTScore for the generated meanings of the candidate models, evaluated across each of the six meme types.
  • Figure 3: Average percentage of human ratings assigned as Agree for the generated meanings and example sentences of the candidate models, evaluated across each of the six meme types. The results of the origin task are omitted, as most memes with an identifiable origin belong to the quotation type.