Talking Wikidata: Communication patterns and their impact on community engagement in collaborative knowledge graphs
Elisavet Koutsiana, Ioannis Reklos, Kholoud Saad Alghamdi, Nitisha Jain, Albert Meroño-Peñuela, Elena Simperl
TL;DR
This study analyzes Wikidata's editor interactions to understand long-term community engagement in a large-scale collaborative knowledge graph. Using a mixed-methods pipeline that combines network analysis, graph embeddings, and text embeddings, the authors show that Wikidata discussions form a small-world, inclusive network whose dynamics are shaped by both topology and discourse content. The results reveal that editor account age and discussion content strongly influence sustained participation, while editor roles and access levels are less predictive. Based on these findings, the authors propose concrete recommendations—including guidance for discussions, post-monitoring systems, mentoring, and templates—that can enhance engagement, sustainability, and knowledge quality in Wikidata and other semantic web communities.
Abstract
We study collaboration patterns of Wikidata, one of the world's largest open source collaborative knowledge graph (KG) communities. Collaborative KG communities, play a key role in structuring machine-readable knowledge to support AI systems like conversational agents. However, these communities face challenges related to long-term member engagement, as a small subset of contributors often is responsible for the majority of contributions and decision-making. While prior research has explored contributors' roles and lifespans, discussions within collaborative KG communities remain understudied. To fill this gap, we investigated the behavioural patterns of contributors and factors affecting their communication and participation. We analysed all the discussions on Wikidata using a mixed methods approach, including statistical tests, network analysis, and text and graph embedding representations. Our findings reveal that the interactions between Wikidata editors form a small world network, resilient to dropouts and inclusive, where both the network topology and discussion content influence the continuity of conversations. Furthermore, the account age of Wikidata members and their conversations are significant factors in their long-term engagement with the project. Our observations and recommendations can benefit the Wikidata and semantic web communities, providing guidance on how to improve collaborative environments for sustainability, growth, and quality.
