Characterizing Knowledge Manipulation in a Russian Wikipedia Fork
Mykola Trokhymovych, Oleksandr Kosovan, Nathan Forrester, Pablo Aragón, Diego Saez-Trumper, Ricardo Baeza-Yates
TL;DR
This study analyzes a Russian Wikipedia fork (RWFork) to characterize how original Russian Wikipedia content may be manipulated to align with national regulations. It introduces a data-intensive methodology that compares 1.9 million article pairs, extracting text changes, categories, sources, and named entities, and uses NLP-driven clustering with GPT-4o-mini and embeddings to derive an eight-cluster taxonomy of manipulation patterns. The findings show that a small subset of highly viewed articles undergo changes concentrated on topics related to the Ukraine conflict, with systematic edits altering terminology, territorial designations, and sources, while a substantial portion of edits are non-textual metadata adjustments. The work highlights implications for knowledge integrity and the training data used for large language models, and it provides a replicable framework and open data to study other forks and similar platforms.
Abstract
Wikipedia is powered by MediaWiki, a free and open-source software that is also the infrastructure for many other wiki-based online encyclopedias. These include the recently launched website Ruwiki, which has copied and modified the original Russian Wikipedia content to conform to Russian law. To identify practices and narratives that could be associated with different forms of knowledge manipulation, this article presents an in-depth analysis of this Russian Wikipedia fork. We propose a methodology to characterize the main changes with respect to the original version. The foundation of this study is a comprehensive comparative analysis of more than 1.9M articles from Russian Wikipedia and its fork. Using meta-information and geographical, temporal, categorical, and textual features, we explore the changes made by Ruwiki editors. Furthermore, we present a classification of the main topics of knowledge manipulation in this fork, including a numerical estimation of their scope. This research not only sheds light on significant changes within Ruwiki, but also provides a methodology that could be applied to analyze other Wikipedia forks and similar collaborative projects.
