Intertwined Biases Across Social Media Spheres: Unpacking Correlations in Media Bias Dimensions
Yifan Liu, Yike Li, Dong Wang
TL;DR
The paper addresses the fragmentation caused by single-dimension media bias benchmarks by introducing a cross-platform, multi-domain dataset annotated for multiple bias dimensions on YouTube and Reddit collected over five years. It analyzes inter-dimension correlations and temporal dynamics using both manual annotations and automated labeling via shallow models and a large language model, highlighting domain-specific bias expressions and event-driven surges. Key contributions include the first joint labeling of multiple bias dimensions across five domains, a comprehensive correlation and time-series analysis, and recommendations for adaptive multi-task learning to exploit high-bias correlations. The work advances bias identification by providing a resource and insights that support developing more robust, time-aware, and multi-dimensional detection systems, with practical implications for fairer media consumption and ethical journalism.
Abstract
Media bias significantly shapes public perception by reinforcing stereotypes and exacerbating societal divisions. Prior research has often focused on isolated media bias dimensions such as \textit{political bias} or \textit{racial bias}, neglecting the complex interrelationships among various bias dimensions across different topic domains. Moreover, we observe that models trained on existing media bias benchmarks fail to generalize effectively on recent social media posts, particularly in certain bias identification tasks. This shortfall primarily arises because these benchmarks do not adequately reflect the rapidly evolving nature of social media content, which is characterized by shifting user behaviors and emerging trends. In response to these limitations, our research introduces a novel dataset collected from YouTube and Reddit over the past five years. Our dataset includes automated annotations for YouTube content across a broad spectrum of bias dimensions, such as gender, racial, and political biases, as well as hate speech, among others. It spans diverse domains including politics, sports, healthcare, education, and entertainment, reflecting the complex interplay of biases across different societal sectors. Through comprehensive statistical analysis, we identify significant differences in bias expression patterns and intra-domain bias correlations across these domains. By utilizing our understanding of the correlations among various bias dimensions, we lay the groundwork for creating advanced systems capable of detecting multiple biases simultaneously. Overall, our dataset advances the field of media bias identification, contributing to the development of tools that promote fairer media consumption. The comprehensive awareness of existing media bias fosters more ethical journalism, promotes cultural sensitivity, and supports a more informed and equitable public discourse.
