Table of Contents
Fetching ...

MER-Bench: A Comprehensive Benchmark for Multimodal Meme Reappraisal

Yiqi Nie, Fei Wang, Junjie Chen, Kun Li, Yudi Cai, Dan Guo, Chenglong Li, Meng Wang

Abstract

Memes represent a tightly coupled, multimodal form of social expression, in which visual context and overlaid text jointly convey nuanced affect and commentary. Inspired by cognitive reappraisal in psychology, we introduce Meme Reappraisal, a novel multimodal generation task that aims to transform negatively framed memes into constructive ones while preserving their underlying scenario, entities, and structural layout. Unlike prior works on meme understanding or generation, Meme Reappraisal requires emotion-controllable, structure-preserving multimodal transformation under multiple semantic and stylistic constraints. To support this task, we construct MER-Bench, a benchmark of real-world memes with fine-grained multimodal annotations, including source and target emotions, positively rewritten meme text, visual editing specifications, and taxonomy labels covering visual type, sentiment polarity, and layout structure. We further propose a structured evaluation framework based on a multimodal large language model (MLLM)-as-a-Judge paradigm, decomposing performance into modality-level generation quality, affect controllability, structural fidelity, and global affective alignment. Extensive experiments across representative image-editing and multimodal-generation systems reveal substantial gaps in satisfying the constraints of structural preservation, semantic consistency, and affective transformation. We believe MER-Bench establishes a foundation for research on controllable meme editing and emotion-aware multimodal generation. Our code is available at: https://github.com/one-seven17/MER-Bench.

MER-Bench: A Comprehensive Benchmark for Multimodal Meme Reappraisal

Abstract

Memes represent a tightly coupled, multimodal form of social expression, in which visual context and overlaid text jointly convey nuanced affect and commentary. Inspired by cognitive reappraisal in psychology, we introduce Meme Reappraisal, a novel multimodal generation task that aims to transform negatively framed memes into constructive ones while preserving their underlying scenario, entities, and structural layout. Unlike prior works on meme understanding or generation, Meme Reappraisal requires emotion-controllable, structure-preserving multimodal transformation under multiple semantic and stylistic constraints. To support this task, we construct MER-Bench, a benchmark of real-world memes with fine-grained multimodal annotations, including source and target emotions, positively rewritten meme text, visual editing specifications, and taxonomy labels covering visual type, sentiment polarity, and layout structure. We further propose a structured evaluation framework based on a multimodal large language model (MLLM)-as-a-Judge paradigm, decomposing performance into modality-level generation quality, affect controllability, structural fidelity, and global affective alignment. Extensive experiments across representative image-editing and multimodal-generation systems reveal substantial gaps in satisfying the constraints of structural preservation, semantic consistency, and affective transformation. We believe MER-Bench establishes a foundation for research on controllable meme editing and emotion-aware multimodal generation. Our code is available at: https://github.com/one-seven17/MER-Bench.
Paper Structure (22 sections, 5 equations, 6 figures, 3 tables)

This paper contains 22 sections, 5 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Comparison of Meme Reappraisal with related tasks. Meme Reappraisal leverages psychology-informed emotion regulation to shift affect while preserving scenario content and meme-style consistency.
  • Figure 2: Overview of MER-Bench construction. (A) Psychological foundations ground the annotation protocol: the Russell circumplex specifies the emotion space, and the reappraisal-based rewriting task is derived from established emotion regulation theory. (B) Iterative rule setting and annotator training follow a Find-Resolve-Label pipeline with expert clarification and calibration. (C) LLM-assisted human rewriting adopts a three-step workflow: emotion detection and image description, positive solution and target emotion specification, and final meme generation.
  • Figure 3: MER-Bench provides a unified taxonomy and evaluation metrics for meme reappraisal.
  • Figure 4: Human validation and prompt ablation results for the proposed MLLM-as-a-Judge evaluation protocol on 100 randomly sampled Meme Reappraisal outputs.
  • Figure 5: Subcategory-wise and overall analysis on Meme Reappraisal. In each run, 700 memes are sampled as evenly as possible from the edited outputs of different models, and scores are averaged over 10 runs. The bar colors in (o) are consistent with the metric-specific color scheme used in \ref{['tab:category_metric_heat', 'tab:main_results_sorted_heat']}.
  • ...and 1 more figures