Table of Contents
Fetching ...

NELA-PS: A Dataset of Pink Slime News Articles for the Study of Local News Ecosystems

Benjamin D. Horne, Maurício Gruppi

Abstract

Pink slime news outlets automatically produce low-quality, often partisan content that is framed as authentic local news. Given that local news is trusted by Americans and is increasingly shutting down due to financial distress, pink slime news outlets have the potential to exploit local information voids. Yet, there are gaps in understanding of pink slime production practices and tactics, particularly over time. Hence, to support future research in this area, we built a dataset of over 7.9M articles from 1093 pink slime sources over 2.5 years.

NELA-PS: A Dataset of Pink Slime News Articles for the Study of Local News Ecosystems

Abstract

Pink slime news outlets automatically produce low-quality, often partisan content that is framed as authentic local news. Given that local news is trusted by Americans and is increasingly shutting down due to financial distress, pink slime news outlets have the potential to exploit local information voids. Yet, there are gaps in understanding of pink slime production practices and tactics, particularly over time. Hence, to support future research in this area, we built a dataset of over 7.9M articles from 1093 pink slime sources over 2.5 years.
Paper Structure (19 sections, 4 figures, 1 table)

This paper contains 19 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Flow diagram of article collection system. Note the collection system is the same system that was developed in norregaard2019nela and used in horne2022nela. This flow graphic is from horne2022nela.
  • Figure 2: (a) Number of articles published per month per pink slime network. Note, some networks produce so little in comparison to Metric Media, they are not easily seen in the figure. (b) The article density per state over the full dataset, where darker red is higher density. (c) Outlet density per state, where darker blue is higher density. Not shown in both maps are Alaska with 8 outlets and 0.47% of the articles and Hawaii with 6 outlets and 0.24% of the articles.
  • Figure 3: Comparisons between the NELA-Local (LN) dataset (column 1) and the NELA-PS (PS) dataset (column 2) over the same time frame. In (a) and (b), we show the distributions of the number of articles per outlet. In (c) and (d), we show the timeline of the number of articles published per day. In (e) and (f), we show the log distribution of the number of words per article. Note the difference in the x and y axes scales between the columns.
  • Figure 4: Content Sharing Network across time, where colors represent if the outlet is a pink slime outlet (colored and annotated in red) or authentic local news outlet (colored and annotated in green). Nodes represent outlets. Node size is based on how many articles are copied from that source and edges are directed weighted edges in the direction of information flow (A $\rightarrow$ B, means B copied from A). We show these networks across three 5-month subsets: articles published between May 2021 and September 2021, between October 2021 and February 2022, and between March 2022 and August 2022. Note, the primary bridge node between the pink slime and local news groups is AP News (colored and annotated in blue). The layout of each network is generated by the Force-atlas-2 layout algorithm in Gephi bastian2009gephijacomy2014forceatlas2. This figure is best viewed in color.