Table of Contents
Fetching ...

What's happening in your neighborhood? A Weakly Supervised Approach to Detect Local News

Deven Santosh Shah, Shiying He, Gosuddin Kamaruddin Siddiqi, Radhika Bansal

TL;DR

This work tackles local news detection as the first step in local-news recommendation, proposing a multilingual, weakly supervised classifier built on XLM-RoBERTa with a CNN-based n-gram head and content-based features. It combines label correction via publisher-affinity-derived gap ratios $g_c = 1 - \frac{n_c}{n_{\max}}$ and GPT-3-driven translation transfer learning with data augmentation (distant supervision and NMT) to scale across languages. The approach yields higher precision and recall than baselines on real-world data and delivers measurable online engagement gains in a two-week A/B test, validating its practical impact for local journalism and neighborhood information delivery. Overall, the method advances local news detection and demonstrates scalable, cross-lingual applicability for content-based local recommendations.

Abstract

Local news articles are a subset of news that impact users in a geographical area, such as a city, county, or state. Detecting local news (Step 1) and subsequently deciding its geographical location as well as radius of impact (Step 2) are two important steps towards accurate local news recommendation. Naive rule-based methods, such as detecting city names from the news title, tend to give erroneous results due to lack of understanding of the news content. Empowered by the latest development in natural language processing, we develop an integrated pipeline that enables automatic local news detection and content-based local news recommendations. In this paper, we focus on Step 1 of the pipeline, which highlights: (1) a weakly supervised framework incorporated with domain knowledge and auto data processing, and (2) scalability to multi-lingual settings. Compared with Stanford CoreNLP NER model, our pipeline has higher precision and recall evaluated on a real-world and human-labeled dataset. This pipeline has potential to more precise local news to users, helps local businesses get more exposure, and gives people more information about their neighborhood safety.

What's happening in your neighborhood? A Weakly Supervised Approach to Detect Local News

TL;DR

This work tackles local news detection as the first step in local-news recommendation, proposing a multilingual, weakly supervised classifier built on XLM-RoBERTa with a CNN-based n-gram head and content-based features. It combines label correction via publisher-affinity-derived gap ratios and GPT-3-driven translation transfer learning with data augmentation (distant supervision and NMT) to scale across languages. The approach yields higher precision and recall than baselines on real-world data and delivers measurable online engagement gains in a two-week A/B test, validating its practical impact for local journalism and neighborhood information delivery. Overall, the method advances local news detection and demonstrates scalable, cross-lingual applicability for content-based local recommendations.

Abstract

Local news articles are a subset of news that impact users in a geographical area, such as a city, county, or state. Detecting local news (Step 1) and subsequently deciding its geographical location as well as radius of impact (Step 2) are two important steps towards accurate local news recommendation. Naive rule-based methods, such as detecting city names from the news title, tend to give erroneous results due to lack of understanding of the news content. Empowered by the latest development in natural language processing, we develop an integrated pipeline that enables automatic local news detection and content-based local news recommendations. In this paper, we focus on Step 1 of the pipeline, which highlights: (1) a weakly supervised framework incorporated with domain knowledge and auto data processing, and (2) scalability to multi-lingual settings. Compared with Stanford CoreNLP NER model, our pipeline has higher precision and recall evaluated on a real-world and human-labeled dataset. This pipeline has potential to more precise local news to users, helps local businesses get more exposure, and gives people more information about their neighborhood safety.
Paper Structure (25 sections, 3 figures, 6 tables)

This paper contains 25 sections, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Local News Recommendation
  • Figure 2: Architecture of Multi-lingual Local News Classifier
  • Figure 3: Example of Back-and-Forth Translation