Table of Contents
Fetching ...

Hope vs. Hate: Understanding User Interactions with LGBTQ+ News Content in Mainstream US News Media through the Lens of Hope Speech

Jonathan Pofcher, Christopher M. Homan, Randall Sell, Ashiqur R. KhudaBukhsh

TL;DR

This study analyzes LGBTQ+ coverage in major US cable news and the public's YouTube responses, introducing a four-way hope speech classifier and a carefully annotated dataset gathered from politically diverse raters. It reveals substantial annotator subjectivity linked to political beliefs and demonstrates that such biases can propagate into fine-tuned language models, particularly when trained on liberal-leaning annotations. The in-the-wild analysis shows negative commentary toward LGBTQ+ topics dominates across outlets, though the degree of negativity and positivity varies by channel. The work underscores the need for bias-aware, inclusive approaches to online discourse analysis and moderation in order to better protect marginalized communities online.

Abstract

This paper makes three contributions. First, via a substantial corpus of 1,419,047 comments posted on 3,161 YouTube news videos of major US cable news outlets, we analyze how users engage with LGBTQ+ news content. Our analyses focus both on positive and negative content. In particular, we construct a fine-grained hope speech classifier that detects positive (hope speech), negative, neutral, and irrelevant content. Second, in consultation with a public health expert specializing on LGBTQ+ health, we conduct an annotation study with a balanced and diverse political representation and release a dataset of 3,750 instances with fine-grained labels and detailed annotator demographic information. Finally, beyond providing a vital resource for the LGBTQ+ community, our annotation study and subsequent in-the-wild assessments reveal (1) strong association between rater political beliefs and how they rate content relevant to a marginalized community; (2) models trained on individual political beliefs exhibit considerable in-the-wild disagreement; and (3) zero-shot large language models (LLMs) align more with liberal raters.

Hope vs. Hate: Understanding User Interactions with LGBTQ+ News Content in Mainstream US News Media through the Lens of Hope Speech

TL;DR

This study analyzes LGBTQ+ coverage in major US cable news and the public's YouTube responses, introducing a four-way hope speech classifier and a carefully annotated dataset gathered from politically diverse raters. It reveals substantial annotator subjectivity linked to political beliefs and demonstrates that such biases can propagate into fine-tuned language models, particularly when trained on liberal-leaning annotations. The in-the-wild analysis shows negative commentary toward LGBTQ+ topics dominates across outlets, though the degree of negativity and positivity varies by channel. The work underscores the need for bias-aware, inclusive approaches to online discourse analysis and moderation in order to better protect marginalized communities online.

Abstract

This paper makes three contributions. First, via a substantial corpus of 1,419,047 comments posted on 3,161 YouTube news videos of major US cable news outlets, we analyze how users engage with LGBTQ+ news content. Our analyses focus both on positive and negative content. In particular, we construct a fine-grained hope speech classifier that detects positive (hope speech), negative, neutral, and irrelevant content. Second, in consultation with a public health expert specializing on LGBTQ+ health, we conduct an annotation study with a balanced and diverse political representation and release a dataset of 3,750 instances with fine-grained labels and detailed annotator demographic information. Finally, beyond providing a vital resource for the LGBTQ+ community, our annotation study and subsequent in-the-wild assessments reveal (1) strong association between rater political beliefs and how they rate content relevant to a marginalized community; (2) models trained on individual political beliefs exhibit considerable in-the-wild disagreement; and (3) zero-shot large language models (LLMs) align more with liberal raters.

Paper Structure

This paper contains 44 sections, 5 figures, 29 tables.

Figures (5)

  • Figure 1: Corpus annotation pipeline
  • Figure 2: Label breakdown by news outlets with 95% confidence intervals. 50k comments from each news outlet found in-the-wild were classified by our best performing model. Numerical values are listed in Table \ref{['tab:in-the-wild']}.
  • Figure 3: Prompt for LGBTQ+ Video Classification
  • Figure 4: Prompt used for classification
  • Figure 5: Prompt for 2-Label Classification