What Media Frames Reveal About Stance: A Dataset and Study about Memes in Climate Change Discourse
Shijia Zhou, Siyao Peng, Simon M. Luebke, Jörg Haßler, Mario Haim, Saif M. Mohammad, Barbara Plank
TL;DR
The paper addresses how media framing shapes stance in climate-change memes and introduces ClimateMemes, a dataset of 1,184 memes annotated for stance and eight frames across 47 subreddits. It benchmarks vision-language models and large language models on stance and frame detection, examining zero- to few-shot learning, OCR, and caption inputs, with human vs synthetic captions enhancing performance. Findings show frames materially signal stance, with LLMs outperforming VLMs on frame detection while VLMs remain strong for stance; human captions and annotated frames further boost accuracy, and frame labels can aid stance in multi-task settings. The work provides a resource and framework for studying multimodal political communication and AI understanding of memes, while acknowledging biases and limitations such as Reddit-only data, single-annotator issuance, and temporal sampling constraints.
Abstract
Media framing refers to the emphasis on specific aspects of perceived reality to shape how an issue is defined and understood. Its primary purpose is to shape public perceptions often in alignment with the authors' opinions and stances. However, the interaction between stance and media frame remains largely unexplored. In this work, we apply an interdisciplinary approach to conceptualize and computationally explore this interaction with internet memes on climate change. We curate CLIMATEMEMES, the first dataset of climate-change memes annotated with both stance and media frames, inspired by research in communication science. CLIMATEMEMES includes 1,184 memes sourced from 47 subreddits, enabling analysis of frame prominence over time and communities, and sheds light on the framing preferences of different stance holders. We propose two meme understanding tasks: stance detection and media frame detection. We evaluate LLaVA-NeXT and Molmo in various setups, and report the corresponding results on their LLM backbone. Human captions consistently enhance performance. Synthetic captions and human-corrected OCR also help occasionally. Our findings highlight that VLMs perform well on stance, but struggle on frames, where LLMs outperform VLMs. Finally, we analyze VLMs' limitations in handling nuanced frames and stance expressions on climate change internet memes.
