Table of Contents
Fetching ...

Understanding and Predicting Derailment in Toxic Conversations on GitHub

Mia Mohammad Imran, Robert Zita, Rebekah Copeland, Preetha Chatterjee, Rahat Rizvi Rahman, Kostadin Damevski

TL;DR

The paper tackles the problem of proactive moderation in GitHub communities by studying conversational derailment that leads to toxicity. It builds a dataset of 202 toxic conversations with derailment annotations and 696 non-toxic conversations, analyzes linguistic and interaction cues, and identifies key signals such as second-person pronouns and incivility TBDFs. It then proposes a proactive moderation method using Summaries of Conversation Dynamics generated by LLMs with a least-to-most prompting strategy, achieving an F1-score around 0.70 in predicting derailment and outperforming baselines. The work demonstrates a practical approach to real-time moderation, provides actionable insights for reducing toxicity, and makes datasets and prompts publicly available to enable replication and extension.

Abstract

Software projects thrive on the involvement and contributions of individuals from different backgrounds. However, toxic language and negative interactions can hinder the participation and retention of contributors and alienate newcomers. Proactive moderation strategies aim to prevent toxicity from occurring by addressing conversations that have derailed from their intended purpose. This study aims to understand and predict conversational derailment leading to toxicity on GitHub. To facilitate this research, we curate a novel dataset comprising 202 toxic conversations from GitHub with annotated derailment points, along with 696 non-toxic conversations as a baseline. Based on this dataset, we identify unique characteristics of toxic conversations and derailment points, including linguistic markers such as second-person pronouns, negation terms, and tones of Bitter Frustration and Impatience, as well as patterns in conversational dynamics between project contributors and external participants. Leveraging these empirical observations, we propose a proactive moderation approach to automatically detect and address potentially harmful conversations before escalation. By utilizing modern LLMs, we develop a conversation trajectory summary technique that captures the evolution of discussions and identifies early signs of derailment. Our experiments demonstrate that LLM prompts tailored to provide summaries of GitHub conversations achieve 70% F1-Score in predicting conversational derailment, strongly improving over a set of baseline approaches.

Understanding and Predicting Derailment in Toxic Conversations on GitHub

TL;DR

The paper tackles the problem of proactive moderation in GitHub communities by studying conversational derailment that leads to toxicity. It builds a dataset of 202 toxic conversations with derailment annotations and 696 non-toxic conversations, analyzes linguistic and interaction cues, and identifies key signals such as second-person pronouns and incivility TBDFs. It then proposes a proactive moderation method using Summaries of Conversation Dynamics generated by LLMs with a least-to-most prompting strategy, achieving an F1-score around 0.70 in predicting derailment and outperforming baselines. The work demonstrates a practical approach to real-time moderation, provides actionable insights for reducing toxicity, and makes datasets and prompts publicly available to enable replication and extension.

Abstract

Software projects thrive on the involvement and contributions of individuals from different backgrounds. However, toxic language and negative interactions can hinder the participation and retention of contributors and alienate newcomers. Proactive moderation strategies aim to prevent toxicity from occurring by addressing conversations that have derailed from their intended purpose. This study aims to understand and predict conversational derailment leading to toxicity on GitHub. To facilitate this research, we curate a novel dataset comprising 202 toxic conversations from GitHub with annotated derailment points, along with 696 non-toxic conversations as a baseline. Based on this dataset, we identify unique characteristics of toxic conversations and derailment points, including linguistic markers such as second-person pronouns, negation terms, and tones of Bitter Frustration and Impatience, as well as patterns in conversational dynamics between project contributors and external participants. Leveraging these empirical observations, we propose a proactive moderation approach to automatically detect and address potentially harmful conversations before escalation. By utilizing modern LLMs, we develop a conversation trajectory summary technique that captures the evolution of discussions and identifies early signs of derailment. Our experiments demonstrate that LLM prompts tailored to provide summaries of GitHub conversations achieve 70% F1-Score in predicting conversational derailment, strongly improving over a set of baseline approaches.

Paper Structure

This paper contains 34 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Example of a toxic conversation on GitHub.
  • Figure 2: Participants in different types of GitHub conversations.
  • Figure 3: Percentage of project participants' comments in GitHub conversation threads ($N_{Toxic} = 202$; $N_{Non-Toxic} = 696$).