From Intrinsic Toxicity to Reception-Based Toxicity: A Contextual Framework for Prediction and Evaluation
Sergey Berezin, Reza Farahbakhsh, Noel Crespi
TL;DR
The paper reframes toxicity as a context-dependent, socially emergent signal rather than an intrinsic property of text, introducing the Contextual Stress Framework (CSF) and a concrete reception-based metric, PONOS, to quantify negative reception within a community. It formalizes toxicity as the intersection of perceived norm violation and audience stress, and demonstrates that PONOS can be estimated from post text or inferred from modeled reactions, with domain-specific pretraining enhancing performance. Empirical validation on a large Reddit dataset shows that reception-based toxicity and intrinsic text-based toxicity are related yet non-equivalent axes, providing complementary information and addressing context-specific harms such as dialect and in-group language. The work offers a path toward more context-aware moderation and harm assessment, while highlighting limitations, ethical considerations, and directions for extending the framework to richer signals and cross-domain applicability.
Abstract
Most toxicity detection models treat toxicity as an intrinsic property of text, overlooking the role of context in shaping its impact. In this position paper, drawing on insights from psychology, neuroscience, and computational social science, we reconceptualise toxicity as a socially emergent signal of stress. We formalise this perspective in the Contextual Stress Framework (CSF), which defines toxicity as a stress-inducing norm violation within a given context and introduces an additional dimension for toxicity detection. As one possible realisation of CSF, we introduce PONOS (Proportion Of Negative Observed Sentiments), a metric that quantifies toxicity through collective social reception rather than lexical features. We validate this approach on a novel dataset, demonstrating improved contextual sensitivity and adaptability when used alongside existing models.
