Table of Contents
Fetching ...

Latent Hatred: A Benchmark for Understanding Implicit Hate Speech

Mai ElSherief, Caleb Ziems, David Muchlinski, Vaishnavi Anupindi, Jordyn Seybolt, Munmun De Choudhury, Diyi Yang

TL;DR

This work addresses the under-explored problem of implicit hate speech by introducing a social-science grounded six-category taxonomy and a large, richly annotated Twitter benchmark that includes fine-grained implicit labels and free-text implied statements. It develops a two-stage annotation process (crowdsourced high-level labeling followed by expert fine-grained labeling), expands the corpus to balance minority classes, and labels target groups and implied meanings for each message. The paper demonstrates that transformer-based models (e.g., BERT) outperform traditional baselines for detection, and it shows promising results for generating explanations of implicit hate using GPT-2, highlighting practical applications for moderation and understandability. It also identifies major challenges in implicit hate detection and outlines future directions to advance modeling, decoding of coded language, and bias mitigation.

Abstract

Hate speech has grown significantly on social media, causing serious consequences for victims of all demographics. Despite much attention being paid to characterize and detect discriminatory speech, most work has focused on explicit or overt hate speech, failing to address a more pervasive form based on coded or indirect language. To fill this gap, this work introduces a theoretically-justified taxonomy of implicit hate speech and a benchmark corpus with fine-grained labels for each message and its implication. We present systematic analyses of our dataset using contemporary baselines to detect and explain implicit hate speech, and we discuss key features that challenge existing models. This dataset will continue to serve as a useful benchmark for understanding this multifaceted issue.

Latent Hatred: A Benchmark for Understanding Implicit Hate Speech

TL;DR

This work addresses the under-explored problem of implicit hate speech by introducing a social-science grounded six-category taxonomy and a large, richly annotated Twitter benchmark that includes fine-grained implicit labels and free-text implied statements. It develops a two-stage annotation process (crowdsourced high-level labeling followed by expert fine-grained labeling), expands the corpus to balance minority classes, and labels target groups and implied meanings for each message. The paper demonstrates that transformer-based models (e.g., BERT) outperform traditional baselines for detection, and it shows promising results for generating explanations of implicit hate using GPT-2, highlighting practical applications for moderation and understandability. It also identifies major challenges in implicit hate detection and outlines future directions to advance modeling, decoding of coded language, and bias mitigation.

Abstract

Hate speech has grown significantly on social media, causing serious consequences for victims of all demographics. Despite much attention being paid to characterize and detect discriminatory speech, most work has focused on explicit or overt hate speech, failing to address a more pervasive form based on coded or indirect language. To fill this gap, this work introduces a theoretically-justified taxonomy of implicit hate speech and a benchmark corpus with fine-grained labels for each message and its implication. We present systematic analyses of our dataset using contemporary baselines to detect and explain implicit hate speech, and we discuss key features that challenge existing models. This dataset will continue to serve as a useful benchmark for understanding this multifaceted issue.

Paper Structure

This paper contains 20 sections, 1 equation, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Sample posts from our dataset outlining the differences between explicit and implicit hate speech. Explicit hate is direct and leverages specific keywords while implicit hate is more abstract. Explicit text has been modified to include a star (*).
  • Figure 2: Amazon Mechanical Turk interface used to collect ternary annotations (explicit hate, implicit hate, and not hate) for our first stage.
  • Figure 3: Amazon Mechanical Turk interface used to collect the hate target and the implied statement per implicit hate speech post.
  • Figure 4: Instructions and examples provided to Amazon Mechanical Turk workers. Our definition of hate speech is grounded in social media communities' rules.