Memes-as-Replies: Can Models Select Humorous Manga Panel Responses?

Ryosuke Kohita; Seiichiro Yoshioka

Memes-as-Replies: Can Models Select Humorous Manga Panel Responses?

Ryosuke Kohita, Seiichiro Yoshioka

TL;DR

The Meme Reply Selection task is introduced and MaMe-Re (Manga Meme Reply Benchmark), a benchmark of 100,000 human-annotated pairs consisting of openly licensed Japanese manga panels and social media posts is presented, suggesting that selecting contextually humorous replies remains an open challenge for current models.

Abstract

Memes are a popular element of modern web communication, used not only as static artifacts but also as interactive replies within conversations. While computational research has focused on analyzing the intrinsic properties of memes, the dynamic and contextual use of memes to create humor remains an understudied area of web science. To address this gap, we introduce the Meme Reply Selection task and present MaMe-Re (Manga Meme Reply Benchmark), a benchmark of 100,000 human-annotated pairs (500,000 total annotations from 2,325 unique annotators) consisting of openly licensed Japanese manga panels and social media posts. Our analysis reveals three key insights: (1) large language models (LLMs) show preliminary evidence of capturing complex social cues such as exaggeration, moving beyond surface-level semantic matching; (2) the inclusion of visual information does not improve performance, revealing a gap between understanding visual content and effectively using it for contextual humor; (3) while LLMs can match human judgments in controlled settings, they struggle to distinguish subtle differences in wit among semantically similar candidates. These findings suggest that selecting contextually humorous replies remains an open challenge for current models.

Memes-as-Replies: Can Models Select Humorous Manga Panel Responses?

TL;DR

Abstract

Paper Structure (38 sections, 5 equations, 7 figures, 2 tables)

This paper contains 38 sections, 5 equations, 7 figures, 2 tables.

Introduction
Related Work
Web Communication and Multimodal Signals.
From Static Content to Interactive Replies.
Humor Mechanisms in Memes and Dialogue.
Problem Formulation
Task Definition.
Dataset Requirements.
Evaluation Metric.
MaMe-Re: Manga Meme Reply Benchmark
Content Collection and Curation.
Funniness Annotation.
Dataset Statistics.
Reply Selection Methods
Similarity-based Selection (sim-select).
...and 23 more sections

Figures (7)

Figure 1: Overview of memes-as-replies. (a) Example of meme use on SNS. (b) Visualization of the Meme Reply Selection task. (c) MaMe-Re benchmark with crowdsourced humor labels.
Figure 2: Crowdworker annotation interface and full task instruction for the funniness scoring task in MaMe-Re. Top: interface screenshot. Bottom: instruction text shown to annotators.
Figure 3: Prompt template. ${FORMAT} has "id, speech" or "id, speech, description" and ${CANDIDATES} have meme candidates in the corresponding csv format.
Figure 4: Main experimental results for Exp1. (a) Table showing the performance ranking across models and methods. S/P: similarity/preference-based; Y/N: with/without descriptions; CHR: Consensus Hit Rate; values in parentheses denote 95% confidence intervals. (b)--(d) Plots of score distributions categorized by panel descriptions, LLMs, and embedding models respectively.
Figure 5: Performance of the retrieve-and-rerank approach (Exp2).
...and 2 more figures

Memes-as-Replies: Can Models Select Humorous Manga Panel Responses?

TL;DR

Abstract

Memes-as-Replies: Can Models Select Humorous Manga Panel Responses?

Authors

TL;DR

Abstract

Table of Contents

Figures (7)