The Ghanaian NLP Landscape: A First Look
Sheriff Issaka, Zhaoyi Zhang, Mihir Heda, Keyi Wang, Yinka Ajibola, Ryan DeMar, Xuefeng Du
TL;DR
The Ghanaian NLP Landscape paper addresses the pressing problem of underrepresentation of Ghanaian languages in AI by conducting the first region-specific systematic review of NLP and MT work. It analyzes 12 Ghanaian-language studies, taxonomy of data sources (religious texts, crowdsourcing, web scraping), model choices (mostly Transformer-based with selective LSTM usage), and evaluation practices (BLEU, CHRF, TER), then articulates a practical roadmap for data collection, model development, and robust evaluation. Key contributions include a granular, region-focused synthesis of datasets, architectures, and metrics, plus concrete recommendations for data diversification, bilingual/multilingual transfer, and ethically-grounded research practices. The study’s significance lies in providing a foundational resource to guide inclusive AI, enabling preservation of linguistic heritage and empowering Ghanaian language communities through accessible NLP tools and benchmarks.
Abstract
Despite comprising one-third of global languages, African languages are critically underrepresented in Artificial Intelligence (AI), threatening linguistic diversity and cultural heritage. Ghanaian languages, in particular, face an alarming decline, with documented extinction and several at risk. This study pioneers a comprehensive survey of Natural Language Processing (NLP) research focused on Ghanaian languages, identifying methodologies, datasets, and techniques employed. Additionally, we create a detailed roadmap outlining challenges, best practices, and future directions, aiming to improve accessibility for researchers. This work serves as a foundational resource for Ghanaian NLP research and underscores the critical need for integrating global linguistic diversity into AI development.
