Wikipedia is Not a Dictionary, Delete! Text Classification as a Proxy for Analysing Wiki Deletion Discussions

Hsuvas Borkakoty; Luis Espinosa-Anke

Wikipedia is Not a Dictionary, Delete! Text Classification as a Proxy for Analysing Wiki Deletion Discussions

Hsuvas Borkakoty, Luis Espinosa-Anke

TL;DR

This work tackles automated content moderation in wiki ecosystems by constructing a multilingual, multi-platform deletion-discussion corpus and evaluating a range of language models on outcome, stance, and policy prediction tasks. It demonstrates that deletion discussions are comparatively easier to predict than other outcomes and shows that self-reported tags have inconsistent utility, especially when masked. The paper reports strong baselines from RoBERTa and XLM-R models, with SetFit offering notable gains on small datasets and in cross-platform settings, highlighting the value of efficient, cross-domain approaches. The findings have practical implications for deploying automated moderation tools across diverse wiki communities and languages, informing model choice and deployment strategies in production systems.

Abstract

Automated content moderation for collaborative knowledge hubs like Wikipedia or Wikidata is an important yet challenging task due to multiple factors. In this paper, we construct a database of discussions happening around articles marked for deletion in several Wikis and in three languages, which we then use to evaluate a range of LMs on different tasks (from predicting the outcome of the discussion to identifying the implicit policy an individual comment might be pointing to). Our results reveal, among others, that discussions leading to deletion are easier to predict, and that, surprisingly, self-produced tags (keep, delete or redirect) don't always help guiding the classifiers, presumably because of users' hesitation or deliberation within comments.

Wikipedia is Not a Dictionary, Delete! Text Classification as a Proxy for Analysing Wiki Deletion Discussions

TL;DR

Abstract

Wikipedia is Not a Dictionary, Delete! Text Classification as a Proxy for Analysing Wiki Deletion Discussions

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)