A Test of Time: Predicting the Sustainable Success of Online Collaboration in Wikipedia
Abraham Israeli, David Jurgens, Daniel Romero
TL;DR
The paper introduces Sustainable Success as a long-term metric for online collaboration and applies it to Wikipedia using SustainPedia, a dataset aggregating over 40K articles and more than 300 explanatory features. A gradient-boosted tree model predicts whether a promoted article remains at a high-quality level, achieving an AU-ROC of up to 0.88, with SHAP analysis revealing Experience as the strongest predictor and a slower, gradual ascent to high-quality status as beneficial for sustainability. The work provides actionable insights, including the value of experienced editors and careful pacing of quality promotions, and offers SustainPedia data and code to enable broader study of long-term collaboration across domains like open-source software and online activism. The findings underscore the importance of maintenance, governance, and social dynamics in sustaining high-quality collaborative outputs over time, with implications for the design of future collaborative platforms.
Abstract
The Internet has significantly expanded the potential for global collaboration, allowing millions of users to contribute to collective projects like Wikipedia. While prior work has assessed the success of online collaborations, most approaches are time-agnostic, evaluating success without considering its longevity. Research on the factors that ensure the long-term preservation of high-quality standards in online collaboration is scarce. In this study, we address this gap. We propose a novel metric, `Sustainable Success,' which measures the ability of collaborative efforts to maintain their quality over time. Using Wikipedia as a case study, we introduce the SustainPedia dataset, which compiles data from over 40K Wikipedia articles, including each article's sustainable success label and more than 300 explanatory features such as edit history, user experience, and team composition. Using this dataset, we develop machine learning models to predict the sustainable success of Wikipedia articles. Our best-performing model achieves a high AU-ROC score of 0.88 on average. Our analysis reveals important insights. For example, we find that the longer an article takes to be recognized as high-quality, the more likely it is to maintain that status over time (i.e., be sustainable). Additionally, user experience emerged as the most critical predictor of sustainability. Our analysis provides insights into broader collective actions beyond Wikipedia (e.g., online activism, crowdsourced open-source software), where the same social dynamics that drive success on Wikipedia might play a role. We make all data and code used for this study publicly available for further research.
