Table of Contents
Fetching ...

DocNet: Semantic Structure in Inductive Bias Detection Models

Jessica Zhu, Iain Cruickshank, Michel Cukier

TL;DR

DocNet proposes a low-resource, inductive approach to political bias detection that relies on per-article word co-occurrence graphs rather than pretrained language models. By constructing undirected graphs and exploring multiple unsupervised embeddings (Graph2Vec, GAEs, Doc2Vec, and SBERT for comparison), the method demonstrates that semantic structure alone can yield competitive bias predictions, with domain and label aggregation enabling analysis at both article and domain levels. The study uses four datasets (AFG, OATH, VAX, BASIL) and benchmarks against LLMs and a naive baseline, showing that DocNet can match or exceed many baselines while remaining resource-efficient. Overall, DocNet highlights the value of semantic structure in news bias detection, offering a scalable, language-agnostic pathway for low-resource settings and broader accessibility to bias-aware media consumption.

Abstract

News will be biased so long as people have opinions. As social media becomes the primary entry point for news and partisan differences increase, it is increasingly important for informed citizens to be able to recognize bias. If people are aware of the biases of the news they consume, they will be able to take action to avoid polarizing echo chambers. In this paper, we explore an often overlooked aspect of bias detection in media: the semantic structure of news articles. We present DocNet, a novel, inductive, and low-resource document embedding and political bias detection model. We also demonstrate that the semantic structure of news articles from opposing political sides, as represented in document-level graph embeddings, have significant similarities. DocNet bypasses the need for pre-trained language models, reducing resource dependency while achieving comparable performance. It can be used to advance political bias detection in low-resource environments. Our code and data are made available at: https://anonymous.4open.science/r/DocNet/

DocNet: Semantic Structure in Inductive Bias Detection Models

TL;DR

DocNet proposes a low-resource, inductive approach to political bias detection that relies on per-article word co-occurrence graphs rather than pretrained language models. By constructing undirected graphs and exploring multiple unsupervised embeddings (Graph2Vec, GAEs, Doc2Vec, and SBERT for comparison), the method demonstrates that semantic structure alone can yield competitive bias predictions, with domain and label aggregation enabling analysis at both article and domain levels. The study uses four datasets (AFG, OATH, VAX, BASIL) and benchmarks against LLMs and a naive baseline, showing that DocNet can match or exceed many baselines while remaining resource-efficient. Overall, DocNet highlights the value of semantic structure in news bias detection, offering a scalable, language-agnostic pathway for low-resource settings and broader accessibility to bias-aware media consumption.

Abstract

News will be biased so long as people have opinions. As social media becomes the primary entry point for news and partisan differences increase, it is increasingly important for informed citizens to be able to recognize bias. If people are aware of the biases of the news they consume, they will be able to take action to avoid polarizing echo chambers. In this paper, we explore an often overlooked aspect of bias detection in media: the semantic structure of news articles. We present DocNet, a novel, inductive, and low-resource document embedding and political bias detection model. We also demonstrate that the semantic structure of news articles from opposing political sides, as represented in document-level graph embeddings, have significant similarities. DocNet bypasses the need for pre-trained language models, reducing resource dependency while achieving comparable performance. It can be used to advance political bias detection in low-resource environments. Our code and data are made available at: https://anonymous.4open.science/r/DocNet/
Paper Structure (27 sections, 2 equations, 6 figures, 7 tables)

This paper contains 27 sections, 2 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Pipeline of DocNet Methodology (bold) and Additional Experiments.
  • Figure 2: Correlation Plot of Graph Metrics and Bias from Topic Aligned Dataset
  • Figure 3: Correlation Plot of Graph Metrics and Bias from BASIL (Art* is article bias, D* is domain bias)
  • Figure 4: Graphs of articles with high probability predictions from BASIL. Predictions from Graph2Vec: Word Nodes at the domain level using binary labels (Macro F-1 = .82). Source domains: a) NYTimes, b) Fox, c) Huffington Post d) NYTimes
  • Figure 5: Graphs of articles with high probability predictions from the VAX dataset. Predictions from Graph2Vec: Word Nodes using binary labels (Macro F-1 = .75). Source domains are: a) ABC, b) AFN, c) Toronto Sun d) MSN
  • ...and 1 more figures