Susu Box or Piggy Bank: Assessing Cultural Commonsense Knowledge between Ghana and the U.S

Christabel Acquaye; Haozhe An; Rachel Rudinger

Susu Box or Piggy Bank: Assessing Cultural Commonsense Knowledge between Ghana and the U.S

Christabel Acquaye, Haozhe An, Rachel Rudinger

Abstract

Recent work has highlighted the culturally-contingent nature of commonsense knowledge. We introduce AMAMMER$ε$, a test set of 525 multiple-choice questions designed to evaluate the commonsense knowledge of English LLMs, relative to the cultural contexts of Ghana and the United States. To create AMAMMER$ε$, we select a set of multiple-choice questions (MCQs) from existing commonsense datasets and rewrite them in a multi-stage process involving surveys of Ghanaian and U.S. participants. In three rounds of surveys, participants from both pools are solicited to (1) write correct and incorrect answer choices, (2) rate individual answer choices on a 5-point Likert scale, and (3) select the best answer choice from the newly-constructed MCQ items, in a final validation step. By engaging participants at multiple stages, our procedure ensures that participant perspectives are incorporated both in the creation and validation of test items, resulting in high levels of agreement within each pool. We evaluate several off-the-shelf English LLMs on AMAMMER$ε$. Uniformly, models prefer answers choices that align with the preferences of U.S. annotators over Ghanaian annotators. Additionally, when test items specify a cultural context (Ghana or the U.S.), models exhibit some ability to adapt, but performance is consistently better in U.S. contexts than Ghanaian. As large resources are devoted to the advancement of English LLMs, our findings underscore the need for culturally adaptable models and evaluations to meet the needs of diverse English-speaking populations around the world.

Susu Box or Piggy Bank: Assessing Cultural Commonsense Knowledge between Ghana and the U.S

Abstract

Recent work has highlighted the culturally-contingent nature of commonsense knowledge. We introduce AMAMMER

, a test set of 525 multiple-choice questions designed to evaluate the commonsense knowledge of English LLMs, relative to the cultural contexts of Ghana and the United States. To create AMAMMER

, we select a set of multiple-choice questions (MCQs) from existing commonsense datasets and rewrite them in a multi-stage process involving surveys of Ghanaian and U.S. participants. In three rounds of surveys, participants from both pools are solicited to (1) write correct and incorrect answer choices, (2) rate individual answer choices on a 5-point Likert scale, and (3) select the best answer choice from the newly-constructed MCQ items, in a final validation step. By engaging participants at multiple stages, our procedure ensures that participant perspectives are incorporated both in the creation and validation of test items, resulting in high levels of agreement within each pool. We evaluate several off-the-shelf English LLMs on AMAMMER

. Uniformly, models prefer answers choices that align with the preferences of U.S. annotators over Ghanaian annotators. Additionally, when test items specify a cultural context (Ghana or the U.S.), models exhibit some ability to adapt, but performance is consistently better in U.S. contexts than Ghanaian. As large resources are devoted to the advancement of English LLMs, our findings underscore the need for culturally adaptable models and evaluations to meet the needs of diverse English-speaking populations around the world.

Susu Box or Piggy Bank: Assessing Cultural Commonsense Knowledge between Ghana and the U.S

Abstract

Susu Box or Piggy Bank: Assessing Cultural Commonsense Knowledge between Ghana and the U.S

Abstract

Paper Structure

Table of Contents

Figures (25)