Measuring and Forecasting Conversation Incivility: the Role of Antisocial and Prosocial Behaviors
Xinchen Yu, Hayden Arnold, Benjamin Su, Eduardo Blanco
TL;DR
The paper presents a multi-dimensional framework for measuring and forecasting conversation incivility after hate speech, coupling antisocial and prosocial behaviors across four dimensions each. It introduces a data-driven incivility metric S(r) that blends antisocial, prosocial, and neutral signals while incorporating per-user weighting, and validates it against human judgments, achieving substantial agreement. Empirical analyses on a large Reddit dataset reveal distinct linguistic patterns for antisocial and prosocial posts and show that antisocial content more often prompts re-engagement, though prosocial content is prevalent in longer, multi-turn discussions. Supervised models using hate-post and reply context outperform GPT-4o prompting in predicting incivility levels, with pretraining and data blending providing additional gains; error analysis highlights challenges such as sarcasm and negation. Overall, the work provides a robust, scalable approach to anticipate escalation in online discourse and informs moderation strategies to promote civil engagement while acknowledging methodological and domain limitations.
Abstract
This paper focuses on the task of measuring and forecasting incivility in conversations following replies to hate speech. Identifying replies that steer conversations away from hatred and elicit civil follow-up conversations sheds light into effective strategies to engage with hate speech and proactively avoid further escalation. We propose new metrics that take into account various dimensions of antisocial and prosocial behaviors to measure the conversation incivility following replies to hate speech. Our best metric aligns with human perceptions better than prior work. Additionally, we present analyses on a) the language of antisocial and prosocial posts, b) the relationship between antisocial or prosocial posts and user interactions, and c) the language of replies to hate speech that elicit follow-up conversations with different incivility levels. We show that forecasting the incivility level of conversations following a reply to hate speech is a challenging task. We also present qualitative analyses to identify the most common errors made by our best model.
