Why Should This Article Be Deleted? Transparent Stance Detection in Multilingual Wikipedia Editor Discussions

Lucie-Aimée Kaffee; Arnav Arora; Isabelle Augenstein

Why Should This Article Be Deleted? Transparent Stance Detection in Multilingual Wikipedia Editor Discussions

Lucie-Aimée Kaffee, Arnav Arora, Isabelle Augenstein

TL;DR

A novel multilingual dataset of Wikipedia editor discussions along with their reasoning in three languages is constructed and it is demonstrated that stance and corresponding reason can be predicted jointly with a high degree of accuracy, adding transparency to the decision-making process.

Abstract

The moderation of content on online platforms is usually non-transparent. On Wikipedia, however, this discussion is carried out publicly and the editors are encouraged to use the content moderation policies as explanations for making moderation decisions. Currently, only a few comments explicitly mention those policies -- 20% of the English ones, but as few as 2% of the German and Turkish comments. To aid in this process of understanding how content is moderated, we construct a novel multilingual dataset of Wikipedia editor discussions along with their reasoning in three languages. The dataset contains the stances of the editors (keep, delete, merge, comment), along with the stated reason, and a content moderation policy, for each edit decision. We demonstrate that stance and corresponding reason (policy) can be predicted jointly with a high degree of accuracy, adding transparency to the decision-making process. We release both our joint prediction models and the multilingual content moderation dataset for further research on automated transparent content moderation.

Why Should This Article Be Deleted? Transparent Stance Detection in Multilingual Wikipedia Editor Discussions

TL;DR

Abstract

Paper Structure (87 sections, 4 figures, 9 tables)

This paper contains 87 sections, 4 figures, 9 tables.

Introduction
Related Work
Policies for Content Moderation
Deletion Discussion on Wikipedia
Transparent Stance Detection
Transparent Stance Detection
Policy Prediction
Stance Detection
Transparent Stance Detection
Multilingual Transparent Stance Detection
Dataset
Dataset Creation
Label Analysis
Multilingual Dataset
Dataset Statistics
...and 72 more sections

Figures (4)

Figure 1: Example comment from the dataset
Figure 2: Overview of the approach for policy prediction and stance detection
Figure 3: The 15 most frequently used policies across comments for (a) English, (b) German, and (c) Turkish.
Figure 4: Most salient bi-grams for each label in the training set

Why Should This Article Be Deleted? Transparent Stance Detection in Multilingual Wikipedia Editor Discussions

TL;DR

Abstract

Why Should This Article Be Deleted? Transparent Stance Detection in Multilingual Wikipedia Editor Discussions

Authors

TL;DR

Abstract

Table of Contents

Figures (4)