Table of Contents
Fetching ...

Multi-Modal Framing Analysis of News

Arnav Arora, Srishti Yadav, Maria Antoniak, Serge Belongie, Isabelle Augenstein

TL;DR

This work introduces a scalable, multimodal framework for framing analysis in news by leveraging large vision-language models to jointly analyze article text and accompanying images. It builds and analyzes a large-scale dataset of ~500K US-based articles across 28 outlets over 12 months, labeling both generic frames (15 categories) and issue-specific frames, with human-in-the-loop validation. Text framing achieves strong alignment with human annotations (top-line metrics around F1 ~0.5 on a 15-label set), while image framing shows meaningful but more modest agreement, highlighting the distinct signaling roles of visuals. Across corpus and topics, the study reveals systematic differences in how frames are conveyed in text versus images and how framing varies with topic and political leaning, with immigration used as a detailed case study. The paper demonstrates the value of integrative, scalable multimodal framing analyses for understanding media bias and releases a large annotated dataset to support future research in computational framing and media studies.

Abstract

Automated frame analysis of political communication is a popular task in computational social science that is used to study how authors select aspects of a topic to frame its reception. So far, such studies have been narrow, in that they use a fixed set of pre-defined frames and focus only on the text, ignoring the visual contexts in which those texts appear. Especially for framing in the news, this leaves out valuable information about editorial choices, which include not just the written article but also accompanying photographs. To overcome such limitations, we present a method for conducting multi-modal, multi-label framing analysis at scale using large (vision-) language models. Grounding our work in framing theory, we extract latent meaning embedded in images used to convey a certain point and contrast that to the text by comparing the respective frames used. We also identify highly partisan framing of topics with issue-specific frame analysis found in prior qualitative work. We demonstrate a method for doing scalable integrative framing analysis of both text and image in news, providing a more complete picture for understanding media bias.

Multi-Modal Framing Analysis of News

TL;DR

This work introduces a scalable, multimodal framework for framing analysis in news by leveraging large vision-language models to jointly analyze article text and accompanying images. It builds and analyzes a large-scale dataset of ~500K US-based articles across 28 outlets over 12 months, labeling both generic frames (15 categories) and issue-specific frames, with human-in-the-loop validation. Text framing achieves strong alignment with human annotations (top-line metrics around F1 ~0.5 on a 15-label set), while image framing shows meaningful but more modest agreement, highlighting the distinct signaling roles of visuals. Across corpus and topics, the study reveals systematic differences in how frames are conveyed in text versus images and how framing varies with topic and political leaning, with immigration used as a detailed case study. The paper demonstrates the value of integrative, scalable multimodal framing analyses for understanding media bias and releases a large annotated dataset to support future research in computational framing and media studies.

Abstract

Automated frame analysis of political communication is a popular task in computational social science that is used to study how authors select aspects of a topic to frame its reception. So far, such studies have been narrow, in that they use a fixed set of pre-defined frames and focus only on the text, ignoring the visual contexts in which those texts appear. Especially for framing in the news, this leaves out valuable information about editorial choices, which include not just the written article but also accompanying photographs. To overcome such limitations, we present a method for conducting multi-modal, multi-label framing analysis at scale using large (vision-) language models. Grounding our work in framing theory, we extract latent meaning embedded in images used to convey a certain point and contrast that to the text by comparing the respective frames used. We also identify highly partisan framing of topics with issue-specific frame analysis found in prior qualitative work. We demonstrate a method for doing scalable integrative framing analysis of both text and image in news, providing a more complete picture for understanding media bias.

Paper Structure

This paper contains 34 sections, 14 figures, 11 tables.

Figures (14)

  • Figure 1: News can be intentionally framed to affect reader perception. Editorial choices decide what is communicated through words and images. Our approach systematically detects this framing.
  • Figure 2: Distribution of data across the top 30 topics
  • Figure 3: Distribution of data across the time-period of collection, broken down by political leaning.
  • Figure 4: Frequency of predicted generic frames across all articles for texts and images.
  • Figure 5: Examples of generic frame prediction in images vs texts about immigration across political leanings.
  • ...and 9 more figures