Table of Contents
Fetching ...

Reasoning Shapes Alignment: Investigating Cultural Alignment in Large Reasoning Models with Cultural Norms

Yuhang Wang, Yanxu Zhu, Jitao Sang

TL;DR

The paper introduces CNCA, a framework for aligning large reasoning models with cultural norms by automatic norm mining from topic data and limited surveys, and applying them through in-context prompts or norm-enhanced CoT fine-tuning. It evaluates three mining methods (CNCA-T, CNCA-TQ-TA, CNCA-TQ-RA) across two alignment paradigms, showing that stronger reasoning models derive greater gains and that norm-informed fine-tuning (CNCA-SFT, CNCA-DPO) can surpass standard supervised approaches and generalize to out-of-distribution cultural datasets. Key findings include the superiority of Topic+Top-1 answers for in-context alignment, the importance of model reasoning capacity for effective norm utilization, and the demonstrated generalization of CNCA via variably tuned norms. Collectively, CNCA offers a scalable path to reflect diverse human values in AI systems through culturally informed reasoning and supervision signals.

Abstract

The advanced reasoning capabilities of Large Reasoning Models enable them to thoroughly understand and apply safety policies through deliberate thought processes, thereby improving the models' safety. Beyond safety, these models must also be able to reflect the diverse range of human values across various cultures. This paper presents the Cultural Norm-based Cultural Alignment (CNCA) framework, which enables models to leverage their powerful reasoning ability to align with cultural norms. Specifically, we propose three methods to automatically mine cultural norms from limited survey data and explore ways to effectively utilize these norms for improving cultural alignment. Two alignment paradigms are examined: an in-context alignment method, where cultural norms are explicitly integrated into the user context, and a fine-tuning-based method, which internalizes norms through enhanced Chain-of-Thought training data. Comprehensive experiments demonstrate the effectiveness of these methods, highlighting that models with stronger reasoning capabilities benefit more from cultural norm mining and utilization. Our findings emphasize the potential for reasoning models to better reflect diverse human values through culturally informed alignment strategies.

Reasoning Shapes Alignment: Investigating Cultural Alignment in Large Reasoning Models with Cultural Norms

TL;DR

The paper introduces CNCA, a framework for aligning large reasoning models with cultural norms by automatic norm mining from topic data and limited surveys, and applying them through in-context prompts or norm-enhanced CoT fine-tuning. It evaluates three mining methods (CNCA-T, CNCA-TQ-TA, CNCA-TQ-RA) across two alignment paradigms, showing that stronger reasoning models derive greater gains and that norm-informed fine-tuning (CNCA-SFT, CNCA-DPO) can surpass standard supervised approaches and generalize to out-of-distribution cultural datasets. Key findings include the superiority of Topic+Top-1 answers for in-context alignment, the importance of model reasoning capacity for effective norm utilization, and the demonstrated generalization of CNCA via variably tuned norms. Collectively, CNCA offers a scalable path to reflect diverse human values in AI systems through culturally informed reasoning and supervision signals.

Abstract

The advanced reasoning capabilities of Large Reasoning Models enable them to thoroughly understand and apply safety policies through deliberate thought processes, thereby improving the models' safety. Beyond safety, these models must also be able to reflect the diverse range of human values across various cultures. This paper presents the Cultural Norm-based Cultural Alignment (CNCA) framework, which enables models to leverage their powerful reasoning ability to align with cultural norms. Specifically, we propose three methods to automatically mine cultural norms from limited survey data and explore ways to effectively utilize these norms for improving cultural alignment. Two alignment paradigms are examined: an in-context alignment method, where cultural norms are explicitly integrated into the user context, and a fine-tuning-based method, which internalizes norms through enhanced Chain-of-Thought training data. Comprehensive experiments demonstrate the effectiveness of these methods, highlighting that models with stronger reasoning capabilities benefit more from cultural norm mining and utilization. Our findings emphasize the potential for reasoning models to better reflect diverse human values through culturally informed alignment strategies.

Paper Structure

This paper contains 18 sections, 4 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: The framework of our proposed CNCA.
  • Figure 2: Three methods for cultural norm mining: Only Topic (T): Extract norms from the model using only topic information (a); Topic & Questionnaires Top-1 Answer) (TQ (TA)): Extract norms using both topic information and questionnaire data by selecting top-1 answers (b$\to$d); Topic & Questionnaires (Ranked Answers) (TQ (RA)): Similar to TQ (TA), but based on ranked answers from the questionnaire data (c$\to$d). Note that questionnaires represent aggregated survey results from different countries. Methods TQ (TA) and TQ (RA) mine low-level cultural norms, which are then abstracted into higher-level norms.
  • Figure 3: Self-distillation data synthesis and fine-tuning framework based on cultural norms. $CN_{i}$ represents low-level norms, $R_{i}$ denotes reasoning, and $y^+$ and $y^-$ represent correct and incorrect responses with CoT, respectively.
  • Figure 4: The impact of the number of cultural norms (x-axis) on alignment performance (y-axis). The left and right plots correspond to the experimental results of CNCA-TQ (TA) and CNCA-TQ (RA), respectively.
  • Figure 5: Results of cultural alignment evaluations based on norms generated by various models. The x-axis represents the inference models, the y-axis indicates the alignment scores, and the colors distinguish norms originating from different models.