Cultural Commonsense Knowledge for Intercultural Dialogues
Tuan-Phong Nguyen, Simon Razniewski, Gerhard Weikum
TL;DR
This paper tackles the challenge of encoding culturally aware commonsense knowledge for intercultural dialogues. It introduces Mango, a two-phase distillation pipeline that generates CCSK assertions from LLMs via concept- and culture-driven prompts and then consolidates them through clustering to produce a large, high-quality knowledge base. Mango yields 167K assertions across 30K concepts and 11K cultures, outperforming prior resources in both size and quality, as verified by human annotations. Extrinsic evaluation in intercultural dialogue tasks shows that injecting Mango CCSK into prompts improves specificity, cultural sensitivity, and overall response quality across multiple LLMs, demonstrating practical value for dialogue systems facing cross-cultural interactions.
Abstract
Despite recent progress, large language models (LLMs) still face the challenge of appropriately reacting to the intricacies of social and cultural conventions. This paper presents MANGO, a methodology for distilling high-accuracy, high-recall assertions of cultural knowledge. We judiciously and iteratively prompt LLMs for this purpose from two entry points, concepts and cultures. Outputs are consolidated via clustering and generative summarization. Running the MANGO method with GPT-3.5 as underlying LLM yields 167K high-accuracy assertions for 30K concepts and 11K cultures, surpassing prior resources by a large margin in quality and size. In an extrinsic evaluation for intercultural dialogues, we explore augmenting dialogue systems with cultural knowledge assertions. Notably, despite LLMs inherently possessing cultural knowledge, we find that adding knowledge from MANGO improves the overall quality, specificity, and cultural sensitivity of dialogue responses, as judged by human annotators. Data and code are available for download.
