Trustworthy Hate Speech Detection Through Visual Augmentation
Ziyuan Yang, Ming Yan, Yingyu Chen, Hui Wang, Zexin Lu, Yi Zhang
TL;DR
Hate speech detection is challenged by short, context-sparse text and subjectivity-induced uncertainty. The paper introduces TrusV-HSD, which leverages visual augmentation by generating imagery from text without paired data and fuses text and image cues through a lightweight detector, aided by a trustworthy loss that quantifies and moderates uncertainty via Dirichlet-based reasoning. Empirical results on SE, FNUC, and IHC show superior F1 and robustness, with ablations confirming the contributions of imagery cues, the connection block, and the trustworthy loss; the Mamba block enables scalable long-range multimodal interactions and improvements across backbones like BERT and fBERT. The work advances data-efficient, reliable multimodal hate speech detection while acknowledging and addressing ethical considerations and biases inherent in image generation and dataset composition, suggesting future exploration of user-specific features.
Abstract
The surge of hate speech on social media platforms poses a significant challenge, with hate speech detection~(HSD) becoming increasingly critical. Current HSD methods focus on enriching contextual information to enhance detection performance, but they overlook the inherent uncertainty of hate speech. We propose a novel HSD method, named trustworthy hate speech detection method through visual augmentation (TrusV-HSD), which enhances semantic information through integration with diffused visual images and mitigates uncertainty with trustworthy loss. TrusV-HSD learns semantic representations by effectively extracting trustworthy information through multi-modal connections without paired data. Our experiments on public HSD datasets demonstrate the effectiveness of TrusV-HSD, showing remarkable improvements over conventional methods.
