Table of Contents
Fetching ...

Modelling the Spread of Toxicity and Exploring its Mitigation on Online Social Networks

Aatman Vaidya, Harsh Bhagat, Seema Nagar, Amit A. Nanavati

TL;DR

The paper reframes hate toxicity in online networks as a transformation process where users act as transformers that apply a shift to incoming toxicity before forwarding it. It introduces a three‑category taxonomy—Amplifiers, Attenuators, Copycats—and a shift‑based propagation model, validated with temporal analysis on Twitter, Gab, and Koo showing non‑conservation of toxicity and limited homophily among changing users. An intervention, peace‑bots that emit zero toxicity, is proposed and evaluated, revealing that mitigation effectiveness depends on network topology and bot placement; no universal deployment strategy exists. The findings highlight the need for network‑aware moderation approaches and offer a principled soft‑intervention framework to reduce exposure to toxic content without removing users or links. The work provides actionable insights for platform moderators and informs moderation policy with a dynamics grounded in real data and systematic simulations, formalized through $O(u,t) = I_{avg}(u,t) + s(c(u,t), I_{avg}(u,t))$ and category‑dependent shift sampling.

Abstract

Hate speech on online platforms has been credibly linked to multiple instances of real world violence. This calls for an urgent need to understand how toxic content spreads and how it might be mitigated on online social networks, and expectedly has been the topic of extensive research in recent times. Prior work has largely modelled hate through epidemic or spread activation based diffusion models, in which the users are often divided into two categories, hateful or not. In this work, users are treated as transformers of toxicity, based on how they respond to incoming toxicity. Compared with the incoming toxicity, users amplify, attenuate, or replicate (effectively, transform) the toxicity and send it forward. We do a temporal analysis of toxicity on Twitter, Koo and Gab and find that (a) toxicity is not conserved in the network; (b) only a subset of users change behaviour over time; and (c) there is no evidence of homophily among behaviour-changing users. In our model, each user transforms incoming toxicity by applying a "shift" to it prior to sending it forward. Based on this, we develop a network model of toxicity spread that incorporates time-varying behaviour of users. We find that the "shift" applied by a user is dependent on the input toxicity and the category. Based on this finding, we propose an intervention strategy for toxicity reduction. This is simulated by deploying peace-bots. Through experiments on both real-world and synthetic networks, we demonstrate that peace-bot interventions can reduce toxicity, though their effectiveness depends on network structure and placement strategy.

Modelling the Spread of Toxicity and Exploring its Mitigation on Online Social Networks

TL;DR

The paper reframes hate toxicity in online networks as a transformation process where users act as transformers that apply a shift to incoming toxicity before forwarding it. It introduces a three‑category taxonomy—Amplifiers, Attenuators, Copycats—and a shift‑based propagation model, validated with temporal analysis on Twitter, Gab, and Koo showing non‑conservation of toxicity and limited homophily among changing users. An intervention, peace‑bots that emit zero toxicity, is proposed and evaluated, revealing that mitigation effectiveness depends on network topology and bot placement; no universal deployment strategy exists. The findings highlight the need for network‑aware moderation approaches and offer a principled soft‑intervention framework to reduce exposure to toxic content without removing users or links. The work provides actionable insights for platform moderators and informs moderation policy with a dynamics grounded in real data and systematic simulations, formalized through and category‑dependent shift sampling.

Abstract

Hate speech on online platforms has been credibly linked to multiple instances of real world violence. This calls for an urgent need to understand how toxic content spreads and how it might be mitigated on online social networks, and expectedly has been the topic of extensive research in recent times. Prior work has largely modelled hate through epidemic or spread activation based diffusion models, in which the users are often divided into two categories, hateful or not. In this work, users are treated as transformers of toxicity, based on how they respond to incoming toxicity. Compared with the incoming toxicity, users amplify, attenuate, or replicate (effectively, transform) the toxicity and send it forward. We do a temporal analysis of toxicity on Twitter, Koo and Gab and find that (a) toxicity is not conserved in the network; (b) only a subset of users change behaviour over time; and (c) there is no evidence of homophily among behaviour-changing users. In our model, each user transforms incoming toxicity by applying a "shift" to it prior to sending it forward. Based on this, we develop a network model of toxicity spread that incorporates time-varying behaviour of users. We find that the "shift" applied by a user is dependent on the input toxicity and the category. Based on this finding, we propose an intervention strategy for toxicity reduction. This is simulated by deploying peace-bots. Through experiments on both real-world and synthetic networks, we demonstrate that peace-bot interventions can reduce toxicity, though their effectiveness depends on network structure and placement strategy.

Paper Structure

This paper contains 18 sections, 4 equations, 9 figures, 8 tables, 1 algorithm.

Figures (9)

  • Figure 1: Users viewed as transformers of toxicity: amplifiers, users whose output toxicity is higher than their input toxicity; copycats, whose output toxicity is almost the same as their input toxicity; and attenuators, whose out toxicity is less than their input toxicity.
  • Figure 2: The flow of questions explored in the paper, beginning with the understanding that there are three categories of users.
  • Figure 3: Average Toxicity over time in all the datasets.
  • Figure 4: Distribution of the difference between User's Average Toxicity and the Average Toxicity of its in-degree neighbourhood in all three datasets. All the distribution's fail the Shapiro–Wilk (SW) test for normality.
  • Figure 5: Box and Whisker plot for the distributions in Figure \ref{['fig:dist_diff_tox_koo']}. Since the data is not normal, we use the IQR method to detect outliers and separate them from normal users. The outlier users on the right are called amplifiers, on the left are called attenuators and the remaining are called copycats.
  • ...and 4 more figures