Table of Contents
Fetching ...

Beyond Trial-and-Error: Predicting User Abandonment After a Moderation Intervention

Benedetta Tessa, Lorenzo Cima, Amaury Trujillo, Marco Avvenuti, Stefano Cresci

TL;DR

The paper tackles a novel task: predicting user abandonment after a large-scale moderation intervention on Reddit. By leveraging pre-ban activity, toxicity, writing style, and relational features (out of 142 total) and ground-truth post-ban labels, the authors train classifiers to identify abandoning users, achieving a best micro F1 of $0.914$ (hard abandonment) and $0.814$ (soft abandonment) with Gradient Boosting. The study shows robust generalizability via leave-one-out validation across unseen subreddits and identifies activity, relational, and toxicity signals as the most informative, while writing style adds little predictive power. This work demonstrates the feasibility of predictive moderation, offering a data-driven tool to anticipate intervention outcomes, optimize strategies, and potentially extend to quantification or regression analyses for aggregate effects. The findings also underscore practical considerations around class imbalance, feature selection, and ethical deployment in real platforms.

Abstract

Current content moderation follows a reactive, trial-and-error approach, where interventions are applied and their effects are only measured post-hoc. In contrast, we introduce a proactive, predictive approach that enables moderators to anticipate the impact of their actions before implementation. We propose and tackle the new task of predicting user abandonment following a moderation intervention. We study the reactions of 16,540 users to a massive ban of online communities on Reddit, training a set of binary classifiers to identify those users who would abandon the platform after the intervention -- a problem of great practical relevance. We leverage a dataset of 13.8 million posts to compute a large and diverse set of 142 features, which convey information about the activity, toxicity, relations, and writing style of the users. We obtain promising results, with the best-performing model achieving micro F1-score = 0.914. Our model shows robust generalizability when applied to users from previously unseen communities. Furthermore, we identify activity features as the most informative predictors, followed by relational and toxicity features, while writing style features exhibit limited utility. Theoretically, our results demonstrate the feasibility of adopting a predictive machine learning approach to estimate the effects of moderation interventions. Practically, this work marks a fundamental shift from reactive to predictive moderation, equipping platform administrators with intelligent tools to strategically plan interventions, minimize unintended consequences, and optimize user engagement.

Beyond Trial-and-Error: Predicting User Abandonment After a Moderation Intervention

TL;DR

The paper tackles a novel task: predicting user abandonment after a large-scale moderation intervention on Reddit. By leveraging pre-ban activity, toxicity, writing style, and relational features (out of 142 total) and ground-truth post-ban labels, the authors train classifiers to identify abandoning users, achieving a best micro F1 of (hard abandonment) and (soft abandonment) with Gradient Boosting. The study shows robust generalizability via leave-one-out validation across unseen subreddits and identifies activity, relational, and toxicity signals as the most informative, while writing style adds little predictive power. This work demonstrates the feasibility of predictive moderation, offering a data-driven tool to anticipate intervention outcomes, optimize strategies, and potentially extend to quantification or regression analyses for aggregate effects. The findings also underscore practical considerations around class imbalance, feature selection, and ethical deployment in real platforms.

Abstract

Current content moderation follows a reactive, trial-and-error approach, where interventions are applied and their effects are only measured post-hoc. In contrast, we introduce a proactive, predictive approach that enables moderators to anticipate the impact of their actions before implementation. We propose and tackle the new task of predicting user abandonment following a moderation intervention. We study the reactions of 16,540 users to a massive ban of online communities on Reddit, training a set of binary classifiers to identify those users who would abandon the platform after the intervention -- a problem of great practical relevance. We leverage a dataset of 13.8 million posts to compute a large and diverse set of 142 features, which convey information about the activity, toxicity, relations, and writing style of the users. We obtain promising results, with the best-performing model achieving micro F1-score = 0.914. Our model shows robust generalizability when applied to users from previously unseen communities. Furthermore, we identify activity features as the most informative predictors, followed by relational and toxicity features, while writing style features exhibit limited utility. Theoretically, our results demonstrate the feasibility of adopting a predictive machine learning approach to estimate the effects of moderation interventions. Practically, this work marks a fundamental shift from reactive to predictive moderation, equipping platform administrators with intelligent tools to strategically plan interventions, minimize unintended consequences, and optimize user engagement.
Paper Structure (46 sections, 5 equations, 8 figures, 6 tables)

This paper contains 46 sections, 5 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Overview of the moderation process with a predictive approach. Our work on predicting user abandonment following a large-scale moderation intervention is an example of how simulating different interventions and outcomes can improve platform moderation processes by opening up the possibility to take better-informed decisions.
  • Figure 2: Problem definition time frame. Given an intervention at $t_0$, we want to predict user abandonment in $P^+$, based on user data from $P^-$ and inactivity during a time window from offset $t_{+a}$ to $t_{+n}$.
  • Figure 3: Composition of our datasets and data collection periods. DB-B: data from within the banned subreddits, before the ban. Dataset DB-B is used to select representative users from the banned subreddits. DN-B: data from non-banned subreddits, before the ban. Datasets DB-B and DN-B (i.e., both pre-ban datasets) are used to compute machine learning features. DN-A: data from non-banned subreddits, after the ban. Dataset DN-A is used to provide ground-truth labels for the users based on their activity post-ban.
  • Figure 4: Machine learning pipeline. Our data preparation steps involve feature extraction, outliers removal, ground-truth labeling, and splitting of the dataset into a training and a test set. The training set is used for model training and optimization. This involves feature scaling and selection, data rebalancing, and hyperparameters optimization. Finally, the optimized models are evaluated on the held-out test set.
  • Figure 5: Precision-Recall curves for the Gradient Boosting (GB) model in predicting hard and soft abandonment. The curves illustrate the trade-off between precision and recall, allowing for an assessment of how well the model balances false positives and false negatives in both tasks.
  • ...and 3 more figures