Table of Contents
Fetching ...

WikiGap: Promoting Epistemic Equity by Surfacing Knowledge Gaps Between English Wikipedia and other Language Editions

Zining Wang, Yuxuan Zhang, Dongwook Yoon, Nicholas Vincent, Farhan Samir, Vered Shwartz

TL;DR

WikiGap tackles the English-dominant paradigm in Wikipedia by surfacing cross-lingual knowledge gaps from non-English editions within the English interface. It combines the InfoGap sentence-level gap detector with a five-component UI (in-text markers, a cross-lingual fact panel, provenance cards, language filtering, and a search) to present multilingual facts with clear provenance and minimal reading disruption. In a mixed-methods study with 21 participants, WikiGap improved fact-finding accuracy and speed, boosted usability scores, and increased engagement with non-English content, while raising awareness of multilingual gaps and prompting considerations for cross-lingual editing. The work positions WikiGap as a boundary object that promotes epistemic equity and offers a blueprint for integrating provenance-aware, pluralistic content into reading interfaces and future AI-assisted knowledge systems.

Abstract

With more than 11 times as many pageviews as the next largest edition, English Wikipedia dominates global knowledge access relative to other language editions. Readers are prone to assuming English Wikipedia as a superset of all language editions, leading many to prefer it even when their primary language is not English. Other language editions, however, comprise complementary facts rooted in their respective cultures and media environments, which are marginalized in English Wikipedia. While Wikipedia's user interface enables switching between language editions through its Interlanguage Link (ILL) system, it does not reveal to readers that other language editions contain valuable, complementary information. We present WikiGap, a system that surfaces complementary facts sourced from other Wikipedias within the English Wikipedia interface. Specifically, by combining a recent multilingual information-gap discovery method with a user-centered design, WikiGap enables access to complementary information from French, Russian, and Chinese Wikipedia. In a mixed-methods study (n=21), WikiGap significantly improved fact-finding accuracy, reduced task time, and received a 32-point higher usability score relative to Wikipedia's current ILL-based navigation system. Participants reported increased awareness of the availability of complementary information in non-English editions and reconsidered the completeness of English Wikipedia. WikiGap thus paves the way for improved epistemic equity across language editions.

WikiGap: Promoting Epistemic Equity by Surfacing Knowledge Gaps Between English Wikipedia and other Language Editions

TL;DR

WikiGap tackles the English-dominant paradigm in Wikipedia by surfacing cross-lingual knowledge gaps from non-English editions within the English interface. It combines the InfoGap sentence-level gap detector with a five-component UI (in-text markers, a cross-lingual fact panel, provenance cards, language filtering, and a search) to present multilingual facts with clear provenance and minimal reading disruption. In a mixed-methods study with 21 participants, WikiGap improved fact-finding accuracy and speed, boosted usability scores, and increased engagement with non-English content, while raising awareness of multilingual gaps and prompting considerations for cross-lingual editing. The work positions WikiGap as a boundary object that promotes epistemic equity and offers a blueprint for integrating provenance-aware, pluralistic content into reading interfaces and future AI-assisted knowledge systems.

Abstract

With more than 11 times as many pageviews as the next largest edition, English Wikipedia dominates global knowledge access relative to other language editions. Readers are prone to assuming English Wikipedia as a superset of all language editions, leading many to prefer it even when their primary language is not English. Other language editions, however, comprise complementary facts rooted in their respective cultures and media environments, which are marginalized in English Wikipedia. While Wikipedia's user interface enables switching between language editions through its Interlanguage Link (ILL) system, it does not reveal to readers that other language editions contain valuable, complementary information. We present WikiGap, a system that surfaces complementary facts sourced from other Wikipedias within the English Wikipedia interface. Specifically, by combining a recent multilingual information-gap discovery method with a user-centered design, WikiGap enables access to complementary information from French, Russian, and Chinese Wikipedia. In a mixed-methods study (n=21), WikiGap significantly improved fact-finding accuracy, reduced task time, and received a 32-point higher usability score relative to Wikipedia's current ILL-based navigation system. Participants reported increased awareness of the availability of complementary information in non-English editions and reconsidered the completeness of English Wikipedia. WikiGap thus paves the way for improved epistemic equity across language editions.

Paper Structure

This paper contains 61 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: How the findings from our preliminary interviews (F1-F4) informed WikiGap's design requirements (R1-R3) and core design elements (D1-D5).
  • Figure 2: Knowledge differences in the Wikipedia coverage of Oolong identified by InfoGap. Connecting lines mark overlapping facts; green boxes highlight facts unique to each language edition. English sentences in italics represent translations of facts from Chinese or French Wikipedia, while non-italicized English sentences are from the original English Wikipedia article.
  • Figure 3: Overview of the InfoGap backend pipeline for cross-lingual fact alignment, reproduced from samir-etal-2024-locating and adapted. For additional technical details, see the original paper.
  • Figure 4: Overview of system implementation and data processing pipeline.Top: A high-level overview of the data stream in the WikiGap system. We adapted the InfoGap pipeline to support Chinese-language input alongside existing language pairs, followed by post-processing steps to standardize and merge datasets by topic. Orange process blocks indicate components we developed to enable proper integration and display of multilingual facts in the UI. Bottom: The data structure and rendering flow for an individual fact. This illustrates how each multilingual fact is transformed -- through translation, alignment, tagging, and contextual linking -- into an interactive component in the WikiGap interface.
  • Figure 5: Box plot showing the System Usability Scale (SUS) scores for each condition. Blue represents the control condition (no WikiGap), and orange represents the treatment condition (with WikiGap). Horizontal dashed lines represent standard usability benchmarks in varying shades of green: light green for Fair usability (SUS > 51), medium green for Good usability (SUS > 71), and dark green for Excellent usability (SUS > 86).
  • ...and 2 more figures