Table of Contents
Fetching ...

RV-HATE: Reinforced Multi-Module Voting for Implicit Hate Speech Detection

Yejin Lee, Hyeseon Ahn, Yo-Sub Han

TL;DR

This work tackles the challenge of implicit hate speech detection amid diverse, dataset-specific linguistic patterns. It introduces RV-HATE, a four-module framework built on contrastive learning extensions, with a reinforcement-learning–driven soft voting mechanism that assigns dataset-specific weights to module outputs. Through M0–M3 (clustering-based contrastive learning, target tagging, outlier removal, and hard negatives) and RL-based weighting, RV-HATE achieves state-of-the-art macro-F1 across five datasets and provides interpretability into dataset characteristics. The approach demonstrates that preserving modular specialization and employing dataset-aware voting yields meaningful gains in a domain where improvements are hard to obtain, while also addressing ethical considerations and potential misuse. Overall, RV-HATE advances dataset-aware hate speech detection by combining modular design with adaptive ensembling, enabling robust and explainable performance across varied sources.

Abstract

Hate speech remains prevalent in human society and continues to evolve in its forms and expressions. Modern advancements in internet and online anonymity accelerate its rapid spread and complicate its detection. However, hate speech datasets exhibit diverse characteristics primarily because they are constructed from different sources and platforms, each reflecting different linguistic styles and social contexts. Despite this diversity, prior studies on hate speech detection often rely on fixed methodologies without adapting to data-specific features. We introduce RV-HATE, a detection framework designed to account for the dataset-specific characteristics of each hate speech dataset. RV-HATE consists of multiple specialized modules, where each module focuses on distinct linguistic or contextual features of hate speech. The framework employs reinforcement learning to optimize weights that determine the contribution of each module for a given dataset. A voting mechanism then aggregates the module outputs to produce the final decision. RV-HATE offers two primary advantages: (1)~it improves detection accuracy by tailoring the detection process to dataset-specific attributes, and (2)~it also provides interpretable insights into the distinctive features of each dataset. Consequently, our approach effectively addresses implicit hate speech and achieves superior performance compared to conventional static methods. Our code is available at https://github.com/leeyejin1231/RV-HATE.

RV-HATE: Reinforced Multi-Module Voting for Implicit Hate Speech Detection

TL;DR

This work tackles the challenge of implicit hate speech detection amid diverse, dataset-specific linguistic patterns. It introduces RV-HATE, a four-module framework built on contrastive learning extensions, with a reinforcement-learning–driven soft voting mechanism that assigns dataset-specific weights to module outputs. Through M0–M3 (clustering-based contrastive learning, target tagging, outlier removal, and hard negatives) and RL-based weighting, RV-HATE achieves state-of-the-art macro-F1 across five datasets and provides interpretability into dataset characteristics. The approach demonstrates that preserving modular specialization and employing dataset-aware voting yields meaningful gains in a domain where improvements are hard to obtain, while also addressing ethical considerations and potential misuse. Overall, RV-HATE advances dataset-aware hate speech detection by combining modular design with adaptive ensembling, enabling robust and explainable performance across varied sources.

Abstract

Hate speech remains prevalent in human society and continues to evolve in its forms and expressions. Modern advancements in internet and online anonymity accelerate its rapid spread and complicate its detection. However, hate speech datasets exhibit diverse characteristics primarily because they are constructed from different sources and platforms, each reflecting different linguistic styles and social contexts. Despite this diversity, prior studies on hate speech detection often rely on fixed methodologies without adapting to data-specific features. We introduce RV-HATE, a detection framework designed to account for the dataset-specific characteristics of each hate speech dataset. RV-HATE consists of multiple specialized modules, where each module focuses on distinct linguistic or contextual features of hate speech. The framework employs reinforcement learning to optimize weights that determine the contribution of each module for a given dataset. A voting mechanism then aggregates the module outputs to produce the final decision. RV-HATE offers two primary advantages: (1)~it improves detection accuracy by tailoring the detection process to dataset-specific attributes, and (2)~it also provides interpretable insights into the distinctive features of each dataset. Consequently, our approach effectively addresses implicit hate speech and achieves superior performance compared to conventional static methods. Our code is available at https://github.com/leeyejin1231/RV-HATE.

Paper Structure

This paper contains 48 sections, 8 equations, 9 figures, 12 tables.

Figures (9)

  • Figure 1: Overall workflow of RV-HATE. The method processes implicit hate speech data through four modules $\mathtt{M_0}$ (Sec. \ref{['subsec:module1']}), $\mathtt{M_1}$ (Sec. \ref{['subsec:module2']}), $\mathtt{M_2}$ (Sec. \ref{['subsec:module3']}), and $\mathtt{M_3}$ (Sec. \ref{['subsec:module4']}). Reinforcement learning is employed to determine the optimal weights for each module in the voting process. RV-HATE calculates module-weights for each dataset according to its unique features.
  • Figure 2: Prompt used for identifying whether a hate speech post contains an explicit target.
  • Figure 3: Prompt for verifying labels of the datasets we used.
  • Figure 4: Prompt for NER tagging
  • Figure 5: Confusion matrices of SharedCon (top row) and RV-HATE (bottom row) on the five hate-speech datasets (IHC, SBIC, DYNA, Hateval, and Toxigen). Each cell reports both the absolute count and the percentage of examples for true negatives, false positives, false negatives and true positives.
  • ...and 4 more figures