Table of Contents
Fetching ...

IndiVec: An Exploration of Leveraging Large Language Models for Media Bias Detection with Fine-Grained Bias Indicators

Luyang Lin, Lingzhi Wang, Xiaoyan Zhao, Jing Li, Kam-Fai Wong

TL;DR

This paper tackles media bias detection by critiquing dataset-specific fine-tuning and proposing IndiVec, a general framework that builds a fine-grained bias indicator database using instruction-following LLMs. It replaces pure classification with a descriptor-based matching and majority-vote mechanism over a vector DB of indicators, enabling better out-of-domain generalization and explicit explanations via top-k indicators. The approach yields strong performance across four political bias datasets and demonstrates robustness to data imbalance, with extensive ablations showing the contributions of indicator construction, verification, and mapping strategies. Practical impact lies in a scalable, explainable bias detection tool that can generalize across sources and aid human annotators in bias labeling and dataset refinement.

Abstract

This study focuses on media bias detection, crucial in today's era of influential social media platforms shaping individual attitudes and opinions. In contrast to prior work that primarily relies on training specific models tailored to particular datasets, resulting in limited adaptability and subpar performance on out-of-domain data, we introduce a general bias detection framework, IndiVec, built upon large language models. IndiVec begins by constructing a fine-grained media bias database, leveraging the robust instruction-following capabilities of large language models and vector database techniques. When confronted with new input for bias detection, our framework automatically selects the most relevant indicator from the vector database and employs majority voting to determine the input's bias label. IndiVec excels compared to previous methods due to its adaptability (demonstrating consistent performance across diverse datasets from various sources) and explainability (providing explicit top-k indicators to interpret bias predictions). Experimental results on four political bias datasets highlight IndiVec's significant superiority over baselines. Furthermore, additional experiments and analysis provide profound insights into the framework's effectiveness.

IndiVec: An Exploration of Leveraging Large Language Models for Media Bias Detection with Fine-Grained Bias Indicators

TL;DR

This paper tackles media bias detection by critiquing dataset-specific fine-tuning and proposing IndiVec, a general framework that builds a fine-grained bias indicator database using instruction-following LLMs. It replaces pure classification with a descriptor-based matching and majority-vote mechanism over a vector DB of indicators, enabling better out-of-domain generalization and explicit explanations via top-k indicators. The approach yields strong performance across four political bias datasets and demonstrates robustness to data imbalance, with extensive ablations showing the contributions of indicator construction, verification, and mapping strategies. Practical impact lies in a scalable, explainable bias detection tool that can generalize across sources and aid human annotators in bias labeling and dataset refinement.

Abstract

This study focuses on media bias detection, crucial in today's era of influential social media platforms shaping individual attitudes and opinions. In contrast to prior work that primarily relies on training specific models tailored to particular datasets, resulting in limited adaptability and subpar performance on out-of-domain data, we introduce a general bias detection framework, IndiVec, built upon large language models. IndiVec begins by constructing a fine-grained media bias database, leveraging the robust instruction-following capabilities of large language models and vector database techniques. When confronted with new input for bias detection, our framework automatically selects the most relevant indicator from the vector database and employs majority voting to determine the input's bias label. IndiVec excels compared to previous methods due to its adaptability (demonstrating consistent performance across diverse datasets from various sources) and explainability (providing explicit top-k indicators to interpret bias predictions). Experimental results on four political bias datasets highlight IndiVec's significant superiority over baselines. Furthermore, additional experiments and analysis provide profound insights into the framework's effectiveness.
Paper Structure (38 sections, 2 equations, 4 figures, 7 tables)

This paper contains 38 sections, 2 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Our IndiVec Bias Prediction Framework.
  • Figure 2: Statistics of Constructed Indicator Set.
  • Figure 3: Performance Across Different Indicator Vector Database Sizes (\ref{['sfig:DBsize']}) and Varied Base Datasets for Indicator Construction (\ref{['sfig:basedataset']}).
  • Figure 4: \ref{['sfig:mapping']}:Visualization of 50 randomly sampled instances (Sentence, corresponding Descriptor and Top 5 ranked Indicators). \ref{['sfig:toplast']}: Visualization of top 50 and last 50 ranked indicators for a randomly selected instance with four Descriptors.