AdaCultureSafe: Adaptive Cultural Safety Grounded by Cultural Knowledge in Large Language Models

Hankun Kang; Di Lin; Zhirong Liao; Pengfei Bai; Xinyi Zeng; Jiawei Jiang; Yuanyuan Zhu; Tieyun Qian

AdaCultureSafe: Adaptive Cultural Safety Grounded by Cultural Knowledge in Large Language Models

Hankun Kang, Di Lin, Zhirong Liao, Pengfei Bai, Xinyi Zeng, Jiawei Jiang, Yuanyuan Zhu, Tieyun Qian

TL;DR

This work proposes a novel framework that integrates authoritative cultural knowledge descriptions curation, LLM-automated query generation, and heavy manual verification and presents a knowledge-grounded method, which significantly enhances cultural safety by enforcing the integration of knowledge into the LLM response generation process.

Abstract

With the widespread adoption of Large Language Models (LLMs), respecting indigenous cultures becomes essential for models' culturally safety and responsible global applications. Existing studies separately consider cultural safety and cultural knowledge and neglect that the former should be grounded by the latter. This severely prevents LLMs from yielding culture-specific respectful responses. Consequently, adaptive cultural safety remains a formidable task. In this work, we propose to jointly model cultural safety and knowledge. First and foremost, cultural-safety and knowledge-paired data serve as the key prerequisite to conduct this research. However, the cultural diversity across regions and the subtlety of cultural differences pose significant challenges to the creation of such paired evaluation data. To address this issue, we propose a novel framework that integrates authoritative cultural knowledge descriptions curation, LLM-automated query generation, and heavy manual verification. Accordingly, we obtain a dataset named AdaCultureSafe containing 4.8K manually decomposed fine-grained cultural descriptions and the corresponding 48K manually verified safety- and knowledge-oriented queries. Upon the constructed dataset, we evaluate three families of popular LLMs on their cultural safety and knowledge proficiency, via which we make a critical discovery: no significant correlation exists between their cultural safety and knowledge proficiency. We then delve into the utility-related neuron activations within LLMs to investigate the potential cause of the absence of correlation, which can be attributed to the difference of the objectives of pre-training and post-alignment. We finally present a knowledge-grounded method, which significantly enhances cultural safety by enforcing the integration of knowledge into the LLM response generation process.

AdaCultureSafe: Adaptive Cultural Safety Grounded by Cultural Knowledge in Large Language Models

TL;DR

Abstract

Paper Structure (18 sections, 6 equations, 11 figures, 5 tables)

This paper contains 18 sections, 6 equations, 11 figures, 5 tables.

Introduction
Related Work
Cultural Knowledge in LLMs
Cultural Safety of LLMs
Construction of AdaCultureSafe
Cultural Knowledge Descriptions Collection
LLM-automated Query Generation
Human Verification
Evaluating LLMs with AdaCultureSafe
Experiments
Experimental Setup
Evaluation and Probing Analysis
Cultural-Knowledge-Grounded Method
Conclusion
Limitations and Ethical Considerations
...and 3 more sections

Figures (11)

Figure 1: Comparison between existing studies and our work. Existing studies are unpaired for cultural topics and ignore joint analysis on the same cultural topics, which is our focus.
Figure 2: The construction framework of AdaCultureSafe.
Figure 3: Sample content and structure of AdaCultureSafe.
Figure 4: Trends in the performance of cultural safety and cultural knowledge across different countries. Left: Llama3.1-8B. Center: Mistral-7B. Right: Qwen2.5-7B. The country names are abbreviated with ISO 3166-1 codes.
Figure 5: Correlation between cultural safety and knowledge in LLMs. Left: Llama3.1-8B. Center: Mistral-7B. Right: Qwen2.5-7B.
...and 6 more figures

AdaCultureSafe: Adaptive Cultural Safety Grounded by Cultural Knowledge in Large Language Models

TL;DR

Abstract

AdaCultureSafe: Adaptive Cultural Safety Grounded by Cultural Knowledge in Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (11)