DeHate: A Stable Diffusion-based Multimodal Approach to Mitigate Hate Speech in Images
Dwip Dalal, Gautam Vashishtha, Anku Rani, Aishwarya Reganti, Parth Patwa, Mohd Sarique, Chandan Gupta, Keshav Nath, Viswanatha Reddy, Vinija Jain, Aman Chadha, Amitava Das, Amit Sheth, Asif Ekbal
TL;DR
This work tackles hate speech in multimodal content by creating the DeHate dataset and a diffusion-based dehatification pipeline. It introduces DeHater, a CLIP-guided, FiLM-conditioned model that generates masked outputs to blur hateful regions localized by Diffusion Attentive Attribution Maps (DAAM). The dataset and shared task use an IoU-based evaluation, with top-performing systems achieving approximately 0.55 IOU, illustrating the task's difficulty. The approach advances ethical AI for social media by enabling targeted dehatification and establishing a benchmark for multimodal hate mitigation, with future work including LLM-based justification and multilingual extension.
Abstract
The rise in harmful online content not only distorts public discourse but also poses significant challenges to maintaining a healthy digital environment. In response to this, we introduce a multimodal dataset uniquely crafted for identifying hate in digital content. Central to our methodology is the innovative application of watermarked, stability-enhanced, stable diffusion techniques combined with the Digital Attention Analysis Module (DAAM). This combination is instrumental in pinpointing the hateful elements within images, thereby generating detailed hate attention maps, which are used to blur these regions from the image, thereby removing the hateful sections of the image. We release this data set as a part of the dehate shared task. This paper also describes the details of the shared task. Furthermore, we present DeHater, a vision-language model designed for multimodal dehatification tasks. Our approach sets a new standard in AI-driven image hate detection given textual prompts, contributing to the development of more ethical AI applications in social media.
