Table of Contents
Fetching ...

Are Generative Language Models Multicultural? A Study on Hausa Culture and Emotions using ChatGPT

Ibrahim Said Ahmad, Shiran Dudy, Resmi Ramachandranpillai, Kenneth Church

TL;DR

This study investigates whether generative language models like ChatGPT can accurately reflect Hausa culture and emotions, a low-resource language. It prompts ChatGPT with 37 culturally sensitive questions and compares its outputs to responses from 18 native Hausa speakers using emotion analysis and two similarity metrics (BERTScore and METEOR), complemented by human cultural-alignment ratings. Results indicate only partial cultural alignment: ChatGPT's outputs are largely neutral and semantically similar to human responses, yet they lack authentic Hausa phrasing and emotional diversity, revealing cultural gaps likely due to training data and fine-tuning. The work emphasizes the need for more diverse, inclusive data and evaluation methods (including crowd-truth approaches and human-in-the-loop feedback) to improve LLM performance for low-resource languages and sensitive domains like health and education.

Abstract

Large Language Models (LLMs), such as ChatGPT, are widely used to generate content for various purposes and audiences. However, these models may not reflect the cultural and emotional diversity of their users, especially for low-resource languages. In this paper, we investigate how ChatGPT represents Hausa's culture and emotions. We compare responses generated by ChatGPT with those provided by native Hausa speakers on 37 culturally relevant questions. We conducted experiments using emotion analysis and applied two similarity metrics to measure the alignment between human and ChatGPT responses. We also collected human participants ratings and feedback on ChatGPT responses. Our results show that ChatGPT has some level of similarity to human responses, but also exhibits some gaps and biases in its knowledge and awareness of the Hausa culture and emotions. We discuss the implications and limitations of our methodology and analysis and suggest ways to improve the performance and evaluation of LLMs for low-resource languages.

Are Generative Language Models Multicultural? A Study on Hausa Culture and Emotions using ChatGPT

TL;DR

This study investigates whether generative language models like ChatGPT can accurately reflect Hausa culture and emotions, a low-resource language. It prompts ChatGPT with 37 culturally sensitive questions and compares its outputs to responses from 18 native Hausa speakers using emotion analysis and two similarity metrics (BERTScore and METEOR), complemented by human cultural-alignment ratings. Results indicate only partial cultural alignment: ChatGPT's outputs are largely neutral and semantically similar to human responses, yet they lack authentic Hausa phrasing and emotional diversity, revealing cultural gaps likely due to training data and fine-tuning. The work emphasizes the need for more diverse, inclusive data and evaluation methods (including crowd-truth approaches and human-in-the-loop feedback) to improve LLM performance for low-resource languages and sensitive domains like health and education.

Abstract

Large Language Models (LLMs), such as ChatGPT, are widely used to generate content for various purposes and audiences. However, these models may not reflect the cultural and emotional diversity of their users, especially for low-resource languages. In this paper, we investigate how ChatGPT represents Hausa's culture and emotions. We compare responses generated by ChatGPT with those provided by native Hausa speakers on 37 culturally relevant questions. We conducted experiments using emotion analysis and applied two similarity metrics to measure the alignment between human and ChatGPT responses. We also collected human participants ratings and feedback on ChatGPT responses. Our results show that ChatGPT has some level of similarity to human responses, but also exhibits some gaps and biases in its knowledge and awareness of the Hausa culture and emotions. We discuss the implications and limitations of our methodology and analysis and suggest ways to improve the performance and evaluation of LLMs for low-resource languages.
Paper Structure (12 sections, 3 figures, 2 tables)

This paper contains 12 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Emotion analysis for Participants and ChatGPT responses.
  • Figure 2: Median similarity scores between responses returned by ChatGPT and Human Responses Recorded. There is a single response for each prompt per ChatGPT and 18 human responses. Each ChatGPT response is compared to the human responses and the median similarity scores were recorded for the 37 prompts.
  • Figure 3: Participants likely-ness rating of ChatGPT responses. While there are 8.2 subjects on average who find ChatGPT responses to be likely to uttered by native speakers of Hausa, there are 5.3 who find these responses unlikely (The plot indicates median, the average was computed separately).