Table of Contents
Fetching ...

Read Between the Lines: A Benchmark for Uncovering Political Bias in Bangla News Articles

Nusrat Jahan Lia, Shubhashis Roy Dipta, Abdullah Khan Zehady, Naymul Islam, Madhusodan Chakraborty, Abdullah Al Wasif

TL;DR

This work addresses the scarcity of Bangla political stance datasets by introducing BanglaBias, the first annotated benchmark of 200 Bangla news articles labeled as Government Leaning, Government Critique, or Neutral. It details an end-to-end pipeline—from event selection and crawling to human annotation and multi-model evaluation across 28 LLMs—to study how model size and prompting influence stance detection in a low-resource, culturally nuanced setting. Key findings show larger models excel at identifying government critique but struggle with neutral content, and there is a systematic bias toward polarized predictions, particularly in smaller models. The dataset and diagnostic analyses offer a foundation for advancing stance detection in Bangla media, guiding future data collection, model fine-tuning, and bias-aware evaluation in low-resource languages with broad practical implications for media analysis and fairness.

Abstract

Detecting media bias is crucial, specifically in the South Asian region. Despite this, annotated datasets and computational studies for Bangla political bias research remain scarce. Crucially because, political stance detection in Bangla news requires understanding of linguistic cues, cultural context, subtle biases, rhetorical strategies, code-switching, implicit sentiment, and socio-political background. To address this, we introduce the first benchmark dataset of 200 politically significant and highly debated Bangla news articles, labeled for government-leaning, government-critique, and neutral stances, alongside diagnostic analyses for evaluating large language models (LLMs). Our comprehensive evaluation of 28 proprietary and open-source LLMs shows strong performance in detecting government-critique content (F1 up to 0.83) but substantial difficulty with neutral articles (F1 as low as 0.00). Models also tend to over-predict government-leaning stances, often misinterpreting ambiguous narratives. This dataset and its associated diagnostics provide a foundation for advancing stance detection in Bangla media research and offer insights for improving LLM performance in low-resource languages.

Read Between the Lines: A Benchmark for Uncovering Political Bias in Bangla News Articles

TL;DR

This work addresses the scarcity of Bangla political stance datasets by introducing BanglaBias, the first annotated benchmark of 200 Bangla news articles labeled as Government Leaning, Government Critique, or Neutral. It details an end-to-end pipeline—from event selection and crawling to human annotation and multi-model evaluation across 28 LLMs—to study how model size and prompting influence stance detection in a low-resource, culturally nuanced setting. Key findings show larger models excel at identifying government critique but struggle with neutral content, and there is a systematic bias toward polarized predictions, particularly in smaller models. The dataset and diagnostic analyses offer a foundation for advancing stance detection in Bangla media, guiding future data collection, model fine-tuning, and bias-aware evaluation in low-resource languages with broad practical implications for media analysis and fairness.

Abstract

Detecting media bias is crucial, specifically in the South Asian region. Despite this, annotated datasets and computational studies for Bangla political bias research remain scarce. Crucially because, political stance detection in Bangla news requires understanding of linguistic cues, cultural context, subtle biases, rhetorical strategies, code-switching, implicit sentiment, and socio-political background. To address this, we introduce the first benchmark dataset of 200 politically significant and highly debated Bangla news articles, labeled for government-leaning, government-critique, and neutral stances, alongside diagnostic analyses for evaluating large language models (LLMs). Our comprehensive evaluation of 28 proprietary and open-source LLMs shows strong performance in detecting government-critique content (F1 up to 0.83) but substantial difficulty with neutral articles (F1 as low as 0.00). Models also tend to over-predict government-leaning stances, often misinterpreting ambiguous narratives. This dataset and its associated diagnostics provide a foundation for advancing stance detection in Bangla media research and offer insights for improving LLM performance in low-resource languages.

Paper Structure

This paper contains 33 sections, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Overview of political stance detection study (Growing Resources for English vs. Lack of Bangla Resource Availability). We introduce a benchmark of 200 news articles (on politically debatable events) annotated into Government Leaning, Critique, and Neutral labels. We then evaluate performance of 28 LLMs in detecting political stance in Bengali. Performance improves with model size, with Massive and Proprietary models achieving highest F1-scores, but neutral detection remains weak. Bars for Nano models and Neutral label show noticeably larger error ranges across models, indicating unstable performance.
  • Figure 2: Distribution of three classes across the dataset: 95 Govt. Critique (47.5%), 72 Neutral (36.0%), and 33 Govt. Leaning (16.5%).
  • Figure 3: Radar plots showing tendencies of models (per-category) to favor particular labels, relative to the true distribution. The black polygon in each radar plot denotes the true distribution of labels and serves as the baseline.
  • Figure 4: Aggregated Confusion Heatmap over five categories of models: Nano (3 models), Compact (7 models), Standard (6 models), Massive (7 models) and Proprietary (5 models).
  • Figure 5: List of 46 politically debatable events (spanning diverse news coverage) included in our benchmark dataset to capture multiple perspectives.
  • ...and 1 more figures