The COVID That Wasn't: Counterfactual Journalism Using GPT
Sil Hamilton, Andrew Piper
TL;DR
This work introduces counterfactual journalism by fine-tuning GPT-2 on pre-COVID CBC text to generate COVID-19 articles conditioned on actual CBC headlines, enabling a direct comparison with real CBC coverage. The authors collect a CBC COVID-19 corpus, train a pre-COVID CBC–style GPT-2, and generate 5,082 counterfactual articles per model, evaluating them with sentiment, NER, focus, and keyword analyses. They find the counterfactual outputs are more negative and less geopolitically framed than CBC coverage, while CBC shows a shift toward local health framing and person-centered storytelling. The study demonstrates that LLMs can serve as diagnostic tools to explore how editorial choices shape public discourse and suggests broad future directions for domain exploration, audience modeling, and cautious predictive uses of textual simulation.
Abstract
In this paper, we explore the use of large language models to assess human interpretations of real world events. To do so, we use a language model trained prior to 2020 to artificially generate news articles concerning COVID-19 given the headlines of actual articles written during the pandemic. We then compare stylistic qualities of our artificially generated corpus with a news corpus, in this case 5,082 articles produced by CBC News between January 23 and May 5, 2020. We find our artificially generated articles exhibits a considerably more negative attitude towards COVID and a significantly lower reliance on geopolitical framing. Our methods and results hold importance for researchers seeking to simulate large scale cultural processes via recent breakthroughs in text generation.
