A Community-Centric Perspective for Characterizing and Detecting Anti-Asian Violence-Provoking Speech
Gaurav Verma, Rynaa Grover, Jiawei Zhou, Binny Mathew, Jordan Kraemer, Munmun De Choudhury, Srijan Kumar
TL;DR
This paper addresses violence-provoking speech against Asian communities during the COVID-19 era by adopting a community-centric approach that combines a theoretically grounded codebook with crowd-sourced annotations and state-of-the-art NLP classifiers. It leverages a large-scale Twitter dataset (~$418{,}999$ posts) and a dense, community-annotated subset to quantify differences between violence-provoking and hateful speech, revealing that $F_1$ for hateful speech detection reaches $0.89$ while violence-provoking speech detection remains challenging at $F_1 = 0.69$. The authors implement a four-part study (data collection, codebook development, community annotations, classifier evaluation), validate the codebook with substantial inter-annotator agreement, and compare multiple models including RoBERTa-large and Mixtral-Ins, highlighting gaps in current NLP capabilities for nuanced violence-provoking cues. They discuss policy and practical implications, including tiered moderation penalties and trauma-informed community support, and provide open access to resources and data to foster further research and safer online ecosystems.
Abstract
Violence-provoking speech -- speech that implicitly or explicitly promotes violence against the members of the targeted community, contributed to a massive surge in anti-Asian crimes during the pandemic. While previous works have characterized and built tools for detecting other forms of harmful speech, like fear speech and hate speech, our work takes a community-centric approach to studying anti-Asian violence-provoking speech. Using data from ~420k Twitter posts spanning a 3-year duration (January 1, 2020 to February 1, 2023), we develop a codebook to characterize anti-Asian violence-provoking speech and collect a community-crowdsourced dataset to facilitate its large-scale detection using state-of-the-art classifiers. We contrast the capabilities of natural language processing classifiers, ranging from BERT-based to LLM-based classifiers, in detecting violence-provoking speech with their capabilities to detect anti-Asian hateful speech. In contrast to prior work that has demonstrated the effectiveness of such classifiers in detecting hateful speech ($F_1 = 0.89$), our work shows that accurate and reliable detection of violence-provoking speech is a challenging task ($F_1 = 0.69$). We discuss the implications of our findings, particularly the need for proactive interventions to support Asian communities during public health crises. The resources related to the study are available at https://claws-lab.github.io/violence-provoking-speech/.
