Detecting Sockpuppetry on Wikipedia Using Meta-Learning
Luc Raszewski, Christine De Kock
TL;DR
This work tackles sockpuppetry detection on Wikipedia under data-scarce conditions by applying meta-learning to rapidly adapt to unseen puppetmasters. It frames each sockpuppet investigation as a separate task and learns a meta-model via Reptile across many tasks, enabling quick specialization with limited data. The authors release a new dataset of sockpuppet investigations (23,610 tasks) and propose a realistic, accused-user–centric task setup to mirror real deployment. Empirically, meta-learning yields substantial improvements in $AUROC$, $AUPRC$, $F1$, $F_{0.5}$, and accuracy over non-meta baselines, primarily by increasing precision and reducing false positives, while acknowledging limitations when messages are absent and highlighting ethical considerations. The work advances online-safety research by enabling more reliable, data-efficient sockpuppet detection and provides resources for ongoing meta-learning research in this domain.
Abstract
Malicious sockpuppet detection on Wikipedia is critical to preserving access to reliable information on the internet and preventing the spread of disinformation. Prior machine learning approaches rely on stylistic and meta-data features, but do not prioritise adaptability to author-specific behaviours. As a result, they struggle to effectively model the behaviour of specific sockpuppet-groups, especially when text data is limited. To address this, we propose the application of meta-learning, a machine learning technique designed to improve performance in data-scarce settings by training models across multiple tasks. Meta-learning optimises a model for rapid adaptation to the writing style of a new sockpuppet-group. Our results show that meta-learning significantly enhances the precision of predictions compared to pre-trained models, marking an advancement in combating sockpuppetry on open editing platforms. We release a new dataset of sockpuppet investigations to foster future research in both sockpuppetry and meta-learning fields.
