Understanding the Capabilities and Limitations of Large Language Models for Cultural Commonsense
Siqi Shen, Lajanugen Logeswaran, Moontae Lee, Honglak Lee, Soujanya Poria, Rada Mihalcea
TL;DR
This work probes how large language models handle cultural commonsense across five cultures and five languages, evaluating culture-specific knowledge and general commonsense grounding via QA, country prediction, verification, and association tasks. The authors systematically vary prompts, languages, and models to reveal substantial cross-cultural gaps, language-dependent performance, and biases toward dominant cultures. Key findings show that English prompts often outperform native-language prompts, Iran and Kenya remain underrepresented in training data, and multilingual prompting yields uneven benefits across models. The study offers strategies for improving cultural awareness in LLMs, including diverse training data, translation-enabled prompting, and targeted data augmentation to mitigate biases. Overall, the work provides a benchmarked, rigorous view of cultural commonsense in contemporary LLMs with practical implications for developing more culturally aware AI systems.
Abstract
Large language models (LLMs) have demonstrated substantial commonsense understanding through numerous benchmark evaluations. However, their understanding of cultural commonsense remains largely unexamined. In this paper, we conduct a comprehensive examination of the capabilities and limitations of several state-of-the-art LLMs in the context of cultural commonsense tasks. Using several general and cultural commonsense benchmarks, we find that (1) LLMs have a significant discrepancy in performance when tested on culture-specific commonsense knowledge for different cultures; (2) LLMs' general commonsense capability is affected by cultural context; and (3) The language used to query the LLMs can impact their performance on cultural-related tasks. Our study points to the inherent bias in the cultural understanding of LLMs and provides insights that can help develop culturally aware language models.
