Are Multilingual LLMs Culturally-Diverse Reasoners? An Investigation into Multicultural Proverbs and Sayings
Chen Cecilia Liu, Fajri Koto, Timothy Baldwin, Iryna Gurevych
TL;DR
This work examines whether multilingual LLMs can serve as culturally diverse reasoners by evaluating their ability to memorize and reason with proverbs across six languages using the MAPS dataset. MAPS combines proverbs, conversational contexts, and interpretation labels to test reasoning under cultural common ground, distinguishing memorization from genuine contextual understanding and exploring cross-cultural gaps via translations. Findings show that while models scale up memorization, reasoning with figurative proverbs and cross-cultural interpretations remains weak, with pronounced culture gaps in translations. The study releases MAPS to enable rigorous evaluation of cross-cultural reasoning in open-source mLLMs and highlights the need for culturally informed multilingual training and evaluation approaches for inclusive cross-language understanding.
Abstract
Large language models (LLMs) are highly adept at question answering and reasoning tasks, but when reasoning in a situational context, human expectations vary depending on the relevant cultural common ground. As languages are associated with diverse cultures, LLMs should also be culturally-diverse reasoners. In this paper, we study the ability of a wide range of state-of-the-art multilingual LLMs (mLLMs) to reason with proverbs and sayings in a conversational context. Our experiments reveal that: (1) mLLMs "know" limited proverbs and memorizing proverbs does not mean understanding them within a conversational context; (2) mLLMs struggle to reason with figurative proverbs and sayings, and when asked to select the wrong answer (instead of asking it to select the correct answer); and (3) there is a "culture gap" in mLLMs when reasoning about proverbs and sayings translated from other languages. We construct and release our evaluation dataset MAPS (MulticultrAl Proverbs and Sayings) for proverb understanding with conversational context for six different languages.
