Table of Contents
Fetching ...

3DLNews: A Three-decade Dataset of US Local News Articles

Gangani Ariyarathne, Alexander C. Nwala

TL;DR

3DLNews addresses the lack of long-span datasets for US local news by compiling roughly 1 million URLs from over 14,000 local outlets across all 50 states between 1996 and 2024, using Google and Twitter scraping. Each URL is enriched with metadata such as publication date, outlet geolocation, and HTML content, with a transparent filtering pipeline to isolate news articles. The dataset enables analyses of local news nationalization, bias, deserts, and community perspectives, and is released with both raw and filtered data to support reproducibility. The resource covers about 68% of US counties and offers a foundation for studying the US local media ecosystem and its evolution over three decades.

Abstract

We present 3DLNews, a novel dataset with local news articles from the United States spanning the period from 1996 to 2024. It contains almost 1 million URLs (with HTML text) from over 14,000 local newspapers, TV, and radio stations across all 50 states, and provides a broad snapshot of the US local news landscape. The dataset was collected by scraping Google and Twitter search results. We employed a multi-step filtering process to remove non-news article links and enriched the dataset with metadata such as the names and geo-coordinates of the source news media organizations, article publication dates, etc. Furthermore, we demonstrated the utility of 3DLNews by outlining four applications.

3DLNews: A Three-decade Dataset of US Local News Articles

TL;DR

3DLNews addresses the lack of long-span datasets for US local news by compiling roughly 1 million URLs from over 14,000 local outlets across all 50 states between 1996 and 2024, using Google and Twitter scraping. Each URL is enriched with metadata such as publication date, outlet geolocation, and HTML content, with a transparent filtering pipeline to isolate news articles. The dataset enables analyses of local news nationalization, bias, deserts, and community perspectives, and is released with both raw and filtered data to support reproducibility. The resource covers about 68% of US counties and offers a foundation for studying the US local media ecosystem and its evolution over three decades.

Abstract

We present 3DLNews, a novel dataset with local news articles from the United States spanning the period from 1996 to 2024. It contains almost 1 million URLs (with HTML text) from over 14,000 local newspapers, TV, and radio stations across all 50 states, and provides a broad snapshot of the US local news landscape. The dataset was collected by scraping Google and Twitter search results. We employed a multi-step filtering process to remove non-news article links and enriched the dataset with metadata such as the names and geo-coordinates of the source news media organizations, article publication dates, etc. Furthermore, we demonstrated the utility of 3DLNews by outlining four applications.
Paper Structure (16 sections, 2 figures, 4 tables)

This paper contains 16 sections, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Local news articles in 3DLNews per US county. Black-colored counties indicate areas without news articles in 3DLNews.
  • Figure 2: Counts of news articles in 3DLNews per year.