An Open Multilingual System for Scoring Readability of Wikipedia

Mykola Trokhymovych; Indira Sen; Martin Gerlach

An Open Multilingual System for Scoring Readability of Wikipedia

Mykola Trokhymovych, Indira Sen, Martin Gerlach

TL;DR

This work tackles multilingual automatic readability assessment for Wikipedia by building a novel dataset that aligns Wikipedia articles with simplified/children encyclopedias across 14 languages and training a single multilingual ranking model. The proposed TRank/SRank architecture uses a Siamese, margin-ranking objective with a multilingual MLM backbone to score individual texts on a continuous readability scale, achieving zero-shot RA above $0.8$ across languages and strong correlations with language-specific readability measures ($\rho$ up to $-0.81$ with FKGL). The authors demonstrate practical impact by analyzing readability across 24 Wikipedias, deploying a public API, and providing an open dataset to foster reproducibility and further research. They also discuss implications for editors, the role of children encyclopedias, and avenues toward text simplification, highlighting both the potential and limitations of current multilingual ARA in reducing information accessibility gaps. Overall, the paper delivers a scalable, open, multilingual framework for assessing readability on a platform-wide scale with direct applications to knowledge equity and editorial tooling.

Abstract

With over 60M articles, Wikipedia has become the largest platform for open and freely accessible knowledge. While it has more than 15B monthly visits, its content is believed to be inaccessible to many readers due to the lack of readability of its text. However, previous investigations of the readability of Wikipedia have been restricted to English only, and there are currently no systems supporting the automatic readability assessment of the 300+ languages in Wikipedia. To bridge this gap, we develop a multilingual model to score the readability of Wikipedia articles. To train and evaluate this model, we create a novel multilingual dataset spanning 14 languages, by matching articles from Wikipedia to simplified Wikipedia and online children encyclopedias. We show that our model performs well in a zero-shot scenario, yielding a ranking accuracy of more than 80% across 14 languages and improving upon previous benchmarks. These results demonstrate the applicability of the model at scale for languages in which there is no ground-truth data available for model fine-tuning. Furthermore, we provide the first overview on the state of readability in Wikipedia beyond English.

An Open Multilingual System for Scoring Readability of Wikipedia

TL;DR

across languages and strong correlations with language-specific readability measures (

up to

with FKGL). The authors demonstrate practical impact by analyzing readability across 24 Wikipedias, deploying a public API, and providing an open dataset to foster reproducibility and further research. They also discuss implications for editors, the role of children encyclopedias, and avenues toward text simplification, highlighting both the potential and limitations of current multilingual ARA in reducing information accessibility gaps. Overall, the paper delivers a scalable, open, multilingual framework for assessing readability on a platform-wide scale with direct applications to knowledge equity and editorial tooling.

Abstract

Paper Structure (38 sections, 1 equation, 5 figures, 3 tables)

This paper contains 38 sections, 1 equation, 5 figures, 3 tables.

Introduction
Related work
Traditional approaches
Language models
Multilingual ARA
Readability in Wikipedia
Data
Dataset sources
Pre-processing
Model
Design requirements
Model architecture
Fine-tuning strategy
Technical implementation
Experimental evaluation
...and 23 more sections

Figures (5)

Figure 1: Sketch of the readability scoring system for Wikipedia articles. Higher scores indicate more difficult-to-read text.
Figure 2: Sketch of the model architecture consisting of two joint readability scoring models trained using a Margin Ranking Loss. $S_1$ and $S_2$ refer to the predicted scores of $Text_1$ and $Text_2$, respectively.
Figure 3: Distribution of model scores vs. FKGL for articles from the test set of simplewiki-en stratified by readability level: simplewiki (easy) and enwiki (hard).
Figure 4: Distribution of readability scores (from the TRank model) across different language editions of Wikipedia. Boxplots show median (red line) and 25- and 75-percentiles with whiskers ranging from 2.5- to 97.5-percentile.
Figure 5: Number of articles that occur in two or more different datasets (single occurrence is skipped).

An Open Multilingual System for Scoring Readability of Wikipedia

TL;DR

Abstract

An Open Multilingual System for Scoring Readability of Wikipedia

Authors

TL;DR

Abstract

Table of Contents

Figures (5)