NaijaNLP: A Survey of Nigerian Low-Resource Languages

Isa Inuwa-Dutse

NaijaNLP: A Survey of Nigerian Low-Resource Languages

Isa Inuwa-Dutse

TL;DR

This paper addresses the scarcity of NLP resources for Nigeria's three major languages (Hausa, Igbo, Yorùbá) by conducting a structured review of LR-NLP literature, resources, and tools. It demonstrates that about 25.1% of studies introduce new linguistic resources and highlights the reliance on repurposed data, with diacritic and tonal representations under-explored. The authors synthesize formal grammar, language particularities, datasets, and downstream tasks, and propose a dedicated resource hub plus strategies including resource enrichment, multilingual transfer, and shared tasks. The work underscores the need for language-specific data collection, open collaboration, and indigenous modelling to advance NaijaNLP, with implications for broader low-resource NLP.

Abstract

With over 500 languages in Nigeria, three languages -- Hausa, Yorùbá and Igbo -- spoken by over 175 million people, account for about 60% of the spoken languages. However, these languages are categorised as low-resource due to insufficient resources to support tasks in computational linguistics. Several research efforts and initiatives have been presented, however, a coherent understanding of the state of Natural Language Processing (NLP) - from grammatical formalisation to linguistic resources that support complex tasks such as language understanding and generation is lacking. This study presents the first comprehensive review of advancements in low-resource NLP (LR-NLP) research across the three major Nigerian languages (NaijaNLP). We quantitatively assess the available linguistic resources and identify key challenges. Although a growing body of literature addresses various NLP downstream tasks in Hausa, Igbo, and Yorùbá, only about 25.1% of the reviewed studies contribute new linguistic resources. This finding highlights a persistent reliance on repurposing existing data rather than generating novel, high-quality resources. Additionally, language-specific challenges, such as the accurate representation of diacritics, remain under-explored. To advance NaijaNLP and LR-NLP more broadly, we emphasise the need for intensified efforts in resource enrichment, comprehensive annotation, and the development of open collaborative initiatives.

NaijaNLP: A Survey of Nigerian Low-Resource Languages

TL;DR

Abstract

NaijaNLP: A Survey of Nigerian Low-Resource Languages

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)