Table of Contents
Fetching ...

Utilizing citation index and synthetic quality measure to compare Wikipedia languages across various topics

Włodzimierz Lewoniewski, Krzysztof Węcel, Witold Abramowicz

TL;DR

This study identified the most significant Wikipedia articles within distinct topical areas, selecting the top 10, top 25, and top 100 most cited articles in each topic and language version using a citation index alongside a synthetic quality measure.

Abstract

This study presents a comparative analysis of 55 Wikipedia language editions employing a citation index alongside a synthetic quality measure. Specifically, we identified the most significant Wikipedia articles within distinct topical areas, selecting the top 10, top 25, and top 100 most cited articles in each topic and language version. This index was built on the basis of wikilinks between Wikipedia articles in each language version and in order to do that we processed 6.6 billion page-to-page link records. Next, we used a quality score for each Wikipedia article - a synthetic measure scaled from 0 to 100. This approach enabled quality comparison of Wikipedia articles even between language versions with different quality grading schemes. Our results highlight disparities among Wikipedia language editions, revealing strengths and gaps in content coverage and quality across topics.

Utilizing citation index and synthetic quality measure to compare Wikipedia languages across various topics

TL;DR

This study identified the most significant Wikipedia articles within distinct topical areas, selecting the top 10, top 25, and top 100 most cited articles in each topic and language version using a citation index alongside a synthetic quality measure.

Abstract

This study presents a comparative analysis of 55 Wikipedia language editions employing a citation index alongside a synthetic quality measure. Specifically, we identified the most significant Wikipedia articles within distinct topical areas, selecting the top 10, top 25, and top 100 most cited articles in each topic and language version. This index was built on the basis of wikilinks between Wikipedia articles in each language version and in order to do that we processed 6.6 billion page-to-page link records. Next, we used a quality score for each Wikipedia article - a synthetic measure scaled from 0 to 100. This approach enabled quality comparison of Wikipedia articles even between language versions with different quality grading schemes. Our results highlight disparities among Wikipedia language editions, revealing strengths and gaps in content coverage and quality across topics.

Paper Structure

This paper contains 5 sections, 2 equations, 1 figure.

Figures (1)

  • Figure 1: Average quality score across Wikipedia languages and topics within the Top 10, Top 25 and Top 100 most cited articles. Interactive version of the charts is available at: https://data.lewoniewski.info/wikiworkshop2025