Table of Contents
Fetching ...

Does Claude's Constitution Have a Culture?

Parham Pourdavood

Abstract

Constitutional AI (CAI) aligns language models with explicitly stated normative principles, offering a transparent alternative to implicit alignment through human feedback alone. However, because constitutions are authored by specific groups of people, the resulting models may reflect particular cultural perspectives. We investigate this question by evaluating Anthropic's Claude Sonnet on 55 World Values Survey items, selected for high cross-cultural variance across six value domains and administered as both direct survey questions and naturalistic advice-seeking scenarios. Comparing Claude's responses to country-level data from 90 nations, we find that Claude's value profile most closely resembles those of Northern European and Anglophone countries, but on a majority of items extends beyond the range of all surveyed populations. When users provide cultural context, Claude adjusts its rhetorical framing but not its substantive value positions, with effect sizes indistinguishable from zero across all twelve tested countries. An ablation removing the system prompt increases refusals but does not alter the values expressed when responses are given, and replication on a smaller model (Claude Haiku) confirms the same cultural profile across model sizes. These findings suggest that when a constitution is authored within the same cultural tradition that dominates the training data, constitutional alignment may codify existing cultural biases rather than correct them--producing a value floor that surface-level interventions cannot meaningfully shift. We discuss the compounding nature of this risk and the need for globally representative constitution-authoring processes.

Does Claude's Constitution Have a Culture?

Abstract

Constitutional AI (CAI) aligns language models with explicitly stated normative principles, offering a transparent alternative to implicit alignment through human feedback alone. However, because constitutions are authored by specific groups of people, the resulting models may reflect particular cultural perspectives. We investigate this question by evaluating Anthropic's Claude Sonnet on 55 World Values Survey items, selected for high cross-cultural variance across six value domains and administered as both direct survey questions and naturalistic advice-seeking scenarios. Comparing Claude's responses to country-level data from 90 nations, we find that Claude's value profile most closely resembles those of Northern European and Anglophone countries, but on a majority of items extends beyond the range of all surveyed populations. When users provide cultural context, Claude adjusts its rhetorical framing but not its substantive value positions, with effect sizes indistinguishable from zero across all twelve tested countries. An ablation removing the system prompt increases refusals but does not alter the values expressed when responses are given, and replication on a smaller model (Claude Haiku) confirms the same cultural profile across model sizes. These findings suggest that when a constitution is authored within the same cultural tradition that dominates the training data, constitutional alignment may codify existing cultural biases rather than correct them--producing a value floor that surface-level interventions cannot meaningfully shift. We discuss the compounding nature of this risk and the need for globally representative constitution-authoring processes.

Paper Structure

This paper contains 30 sections, 6 figures, 5 tables.

Figures (6)

  • Figure 1: UMAP projection of Claude's value profile among 90 WVS Wave 7 countries. Colors represent four data-driven cultural clusters determined by $k$-means clustering on the full-dimensional standardized response profiles. Ellipses indicate approximate 95% confidence regions for each cluster. Although $k$-means assigns Claude to the nearest cluster, it sits at the periphery, separated from even its closest neighbors---reflecting its beyond-human extremity on culturally divisive items.
  • Figure 2: Pearson correlation between Claude's value profile and each of 89 WVS countries with sufficient item coverage. Countries are ordered by similarity, colored by Inglehart-Welzel cultural zone. The United States (14th) is less similar to Claude than several Northern European and Anglophone countries.
  • Figure 3: Hierarchical clustering dendrogram (Ward linkage, Euclidean distance) of Claude's value profile among WVS countries. Claude clusters with Protestant European and English-Speaking countries but joins the tree at a relatively high distance, reflecting its beyond-human extremity.
  • Figure 4: Response distributions for six high-variance items across contrasting countries (colored bars), with Claude's response marked by the black dashed line. On each item, Claude falls at or beyond the most progressive country's distribution.
  • Figure 5: Steerability heatmap showing the mean shift in Claude's coded value position (Format B) when country context is provided, broken down by value domain and country. Values near zero (white) indicate no change from baseline; positive values (red) indicate shifts toward one end of the WVS scale; negative values (blue) toward the other. The overwhelming pattern is near-zero shift across all domain-country combinations.
  • ...and 1 more figures