Are Generative Language Models Multicultural? A Study on Hausa Culture and Emotions using ChatGPT
Ibrahim Said Ahmad, Shiran Dudy, Resmi Ramachandranpillai, Kenneth Church
TL;DR
This study investigates whether generative language models like ChatGPT can accurately reflect Hausa culture and emotions, a low-resource language. It prompts ChatGPT with 37 culturally sensitive questions and compares its outputs to responses from 18 native Hausa speakers using emotion analysis and two similarity metrics (BERTScore and METEOR), complemented by human cultural-alignment ratings. Results indicate only partial cultural alignment: ChatGPT's outputs are largely neutral and semantically similar to human responses, yet they lack authentic Hausa phrasing and emotional diversity, revealing cultural gaps likely due to training data and fine-tuning. The work emphasizes the need for more diverse, inclusive data and evaluation methods (including crowd-truth approaches and human-in-the-loop feedback) to improve LLM performance for low-resource languages and sensitive domains like health and education.
Abstract
Large Language Models (LLMs), such as ChatGPT, are widely used to generate content for various purposes and audiences. However, these models may not reflect the cultural and emotional diversity of their users, especially for low-resource languages. In this paper, we investigate how ChatGPT represents Hausa's culture and emotions. We compare responses generated by ChatGPT with those provided by native Hausa speakers on 37 culturally relevant questions. We conducted experiments using emotion analysis and applied two similarity metrics to measure the alignment between human and ChatGPT responses. We also collected human participants ratings and feedback on ChatGPT responses. Our results show that ChatGPT has some level of similarity to human responses, but also exhibits some gaps and biases in its knowledge and awareness of the Hausa culture and emotions. We discuss the implications and limitations of our methodology and analysis and suggest ways to improve the performance and evaluation of LLMs for low-resource languages.
