The COVID That Wasn't: Counterfactual Journalism Using GPT

Sil Hamilton; Andrew Piper

The COVID That Wasn't: Counterfactual Journalism Using GPT

Sil Hamilton, Andrew Piper

TL;DR

This work introduces counterfactual journalism by fine-tuning GPT-2 on pre-COVID CBC text to generate COVID-19 articles conditioned on actual CBC headlines, enabling a direct comparison with real CBC coverage. The authors collect a CBC COVID-19 corpus, train a pre-COVID CBC–style GPT-2, and generate 5,082 counterfactual articles per model, evaluating them with sentiment, NER, focus, and keyword analyses. They find the counterfactual outputs are more negative and less geopolitically framed than CBC coverage, while CBC shows a shift toward local health framing and person-centered storytelling. The study demonstrates that LLMs can serve as diagnostic tools to explore how editorial choices shape public discourse and suggests broad future directions for domain exploration, audience modeling, and cautious predictive uses of textual simulation.

Abstract

In this paper, we explore the use of large language models to assess human interpretations of real world events. To do so, we use a language model trained prior to 2020 to artificially generate news articles concerning COVID-19 given the headlines of actual articles written during the pandemic. We then compare stylistic qualities of our artificially generated corpus with a news corpus, in this case 5,082 articles produced by CBC News between January 23 and May 5, 2020. We find our artificially generated articles exhibits a considerably more negative attitude towards COVID and a significantly lower reliance on geopolitical framing. Our methods and results hold importance for researchers seeking to simulate large scale cultural processes via recent breakthroughs in text generation.

The COVID That Wasn't: Counterfactual Journalism Using GPT

TL;DR

Abstract

Paper Structure (50 sections, 4 equations, 4 figures, 2 tables)

This paper contains 50 sections, 4 equations, 4 figures, 2 tables.

Introduction
Background
Method
Corpus
Language Model
Fine-tuning
Training Dataset
Training
Model Hyperparameters
Standard Context
Static Context
Rolling Context
Temperature
Models
News Article Generation
...and 35 more sections

Figures (4)

Figure 1: Correlation of sentiment in pre-COVID CBC and GPT articles over a ten year period.
Figure 2: Averaged weekly article sentiment over the first four months of the pandemic.
Figure 3: Average values of given entity types in CBC & GPT articles over the first four months of the pandemic for Model 3.
Figure 4: Relationship between focus and sentiment in CBC articles. Focus values are normalized.

The COVID That Wasn't: Counterfactual Journalism Using GPT

TL;DR

Abstract

The COVID That Wasn't: Counterfactual Journalism Using GPT

Authors

TL;DR

Abstract

Table of Contents

Figures (4)