Table of Contents
Fetching ...

Reap the Wild Wind: Detecting Media Storms in Large-Scale News Corpora

Dror K. Markus, Effi Levi, Tamir Sheafer, Shaul R. Shenhav

TL;DR

This paper tackles the problem of operationalizing media storms in large news corpora, where storms are rare and difficult to label. It introduces a human-in-the-loop framework that converts daily news into dispersion signals across multiple embeddings and uses unsupervised anomaly detection (Prophet) with expert validation to identify storms. Two experimental setups (In-Period and Out-Period) demonstrate the method on 1996–2006 and 2007–2016 data, yielding a dataset of 221 storms. This dataset and method enable systematic analysis of storm dynamics, including their triggers, durations, and distribution across mainstream media and potentially social media.

Abstract

Media Storms, dramatic outbursts of attention to a story, are central components of media dynamics and the attention landscape. Despite their significance, there has been little systematic and empirical research on this concept due to issues of measurement and operationalization. We introduce an iterative human-in-the-loop method to identify media storms in a large-scale corpus of news articles. The text is first transformed into signals of dispersion based on several textual characteristics. In each iteration, we apply unsupervised anomaly detection to these signals; each anomaly is then validated by an expert to confirm the presence of a storm, and those results are then used to tune the anomaly detection in the next iteration. We demonstrate the applicability of this method in two scenarios: first, supplementing an initial list of media storms within a specific time frame; and second, detecting media storms in new time periods. We make available a media storm dataset compiled using both scenarios. Both the method and dataset offer the basis for comprehensive empirical research into the concept of media storms, including characterizing them and predicting their outbursts and durations, in mainstream media or social media platforms.

Reap the Wild Wind: Detecting Media Storms in Large-Scale News Corpora

TL;DR

This paper tackles the problem of operationalizing media storms in large news corpora, where storms are rare and difficult to label. It introduces a human-in-the-loop framework that converts daily news into dispersion signals across multiple embeddings and uses unsupervised anomaly detection (Prophet) with expert validation to identify storms. Two experimental setups (In-Period and Out-Period) demonstrate the method on 1996–2006 and 2007–2016 data, yielding a dataset of 221 storms. This dataset and method enable systematic analysis of storm dynamics, including their triggers, durations, and distribution across mainstream media and potentially social media.

Abstract

Media Storms, dramatic outbursts of attention to a story, are central components of media dynamics and the attention landscape. Despite their significance, there has been little systematic and empirical research on this concept due to issues of measurement and operationalization. We introduce an iterative human-in-the-loop method to identify media storms in a large-scale corpus of news articles. The text is first transformed into signals of dispersion based on several textual characteristics. In each iteration, we apply unsupervised anomaly detection to these signals; each anomaly is then validated by an expert to confirm the presence of a storm, and those results are then used to tune the anomaly detection in the next iteration. We demonstrate the applicability of this method in two scenarios: first, supplementing an initial list of media storms within a specific time frame; and second, detecting media storms in new time periods. We make available a media storm dataset compiled using both scenarios. Both the method and dataset offer the basis for comprehensive empirical research into the concept of media storms, including characterizing them and predicting their outbursts and durations, in mainstream media or social media platforms.
Paper Structure (18 sections, 2 equations, 3 figures, 7 tables)

This paper contains 18 sections, 2 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: Hurricane Katrina -- dispersion signals: a visualization of the signals throughout the coverage of the hurricane. The lines correspond to LLM (purple), plot elements (red), topics (blue), and entities (green). The x-axis marks the dates and the y-axis marks the daily dispersion level (the trace).
  • Figure 2: Media storms durations -- "In-Period"
  • Figure 3: Media storms durations -- "Out-Period"