Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Locked GitHub Issue Threads
Ramtin Ehsani, Mia Mohammad Imran, Robert Zita, Kostadin Damevski, Preetha Chatterjee
TL;DR
The paper tackles incivility in open-source software by building a comprehensive annotated dataset of 404 locked GitHub issue threads and 5,961 comments across 213 projects. It introduces a TBDF-based annotation framework to label incivility type, triggers, targets, and consequences, complemented by a Streamlit annotation tool and GPT-4-assisted quality control. Empirical results show Bitter frustration, Impatience, and Mocking as prevalent TBDFs; common triggers include failed use of tool/code, and targets are primarily People and Code/tool, with Discontinued further discussion as a dominant consequence. The dataset and annotations provide a valuable resource for developing SE-specific incivility detection and mitigation tools, analyzing moderation practices, and understanding the impact of incivility on project health and participation.
Abstract
In the dynamic landscape of open source software (OSS) development, understanding and addressing incivility within issue discussions is crucial for fostering healthy and productive collaborations. This paper presents a curated dataset of 404 locked GitHub issue discussion threads and 5961 individual comments, collected from 213 OSS projects. We annotated the comments with various categories of incivility using Tone Bearing Discussion Features (TBDFs), and, for each issue thread, we annotated the triggers, targets, and consequences of incivility. We observed that Bitter frustration, Impatience, and Mocking are the most prevalent TBDFs exhibited in our dataset. The most common triggers, targets, and consequences of incivility include Failed use of tool/code or error messages, People, and Discontinued further discussion, respectively. This dataset can serve as a valuable resource for analyzing incivility in OSS and improving automated tools to detect and mitigate such behavior.
