Low-resourced Languages and Online Knowledge Repositories: A Need-Finding Study
Hellina Hailu Nigatu, John Canny, Sarah E. Chasins
TL;DR
This study investigates challenges faced by contributors using Online Knowledge Repositories (OKRs) for low-resource Ethiopian languages (Afan Oromo, Amharic, Tigrinya). It employs two empirical methods—a forum-analysis of Wikipedia Talk Pages and a contextual inquiry with 14 novice contributors—to uncover how language scripts, limited resources, and socio-political factors impede content creation. Key findings show struggles with non-Latin input, misspellings, translation quality, limited scholarly sources, and interface barriers, all of which constrain article quantity and quality. The work offers design opportunities to improve Wikipedia interfaces, information retrieval, machine translation, and input modalities, with an emphasis on preserving linguistic and cultural agency. Overall, the paper argues for decolonial, community-centered technology design to empower low-resource language speakers to preserve and share knowledge in their own languages.
Abstract
Online Knowledge Repositories (OKRs) like Wikipedia offer communities a way to share and preserve information about themselves and their ways of living. However, for communities with low-resourced languages -- including most African communities -- the quality and volume of content available are often inadequate. One reason for this lack of adequate content could be that many OKRs embody Western ways of knowledge preservation and sharing, requiring many low-resourced language communities to adapt to new interactions. To understand the challenges faced by low-resourced language contributors on the popular OKR Wikipedia, we conducted (1) a thematic analysis of Wikipedia forum discussions and (2) a contextual inquiry study with 14 novice contributors. We focused on three Ethiopian languages: Afan Oromo, Amharic, and Tigrinya. Our analysis revealed several recurring themes; for example, contributors struggle to find resources to corroborate their articles in low-resourced languages, and language technology support, like translation systems and spellcheck, result in several errors that waste contributors' time. We hope our study will support designers in making online knowledge repositories accessible to low-resourced language speakers.
