Does Generative AI speak Nigerian-Pidgin?: Issues about Representativeness and Bias for Multilingualism in LLMs
David Ifeoluwa Adelani, A. Seza Doğruöz, Iyanuoluwa Shode, Anuoluwapo Aremu
TL;DR
This paper interrogates the representativeness of West African Pidgin English (WAPE) and Nigerian-Pidgin (Naija) in Generative AI. It introduces the Warri benchmark to compare WAPE and Naija across domains (BBC news and Wikipedia) and conducts statistical, zero-shot transfer, and prompting experiments with MT models and large language models to assess linguistic bias. The findings show a clear bias toward WAPE in current AI systems, with Naija being underrepresented and harder to teach with limited data. Qualitative interviews with Naija Wikipedia contributors corroborate distinct writing conventions and the need for more diverse, domain-representative data. The work highlights the urgency of inclusive multilingual AI, especially for high-speaker, low-resource pidgins, and provides the Warri dataset for reproducible evaluation.
Abstract
Nigeria is a multilingual country with 500+ languages. Naija is a Nigerian Pidgin spoken by approximately 120M speakers and it is a mixed language (e.g., English, Portuguese, Yoruba, Hausa and Igbo). Although it has mainly been a spoken language until recently, there are some online platforms (e.g., Wikipedia), publishing in written Naija as well. West African Pidgin English (WAPE) is also spoken in Nigeria and it is used by BBC to broadcast news on the internet to a wider audience not only in Nigeria but also in other West African countries (e.g., Cameroon and Ghana). Through statistical analyses and Machine Translation experiments, our paper shows that these two pidgin varieties do not represent each other (i.e., there are linguistic differences in word order and vocabulary) and Generative AI operates only based on WAPE. In other words, Naija is underrepresented in Generative AI, and it is hard to teach LLMs with few examples. In addition to the statistical analyses, we also provide historical information on both pidgins as well as insights from the interviews conducted with volunteer Wikipedia contributors in Naija.
