Table of Contents
Fetching ...

Values in the Wild: Discovering and Analyzing Values in Real-World Language Model Interactions

Saffron Huang, Esin Durmus, Miles McCain, Kunal Handa, Alex Tamkin, Jerry Hong, Michael Stern, Arushi Somani, Xiuruo Zhang, Deep Ganguli

TL;DR

The paper introduces a privacy-preserving, bottom-up framework to map AI values as expressed in real-world Claude interactions, identifying 3,307 AI values and 2,483 human values and organizing them into a five-domain taxonomy (Personal, Protective, Practical, Social, Epistemic). Using data from large-scale conversations, feature extraction prompts, and chi-square analyses, it demonstrates that AI values are highly task- and context-dependent while also revealing context-invariant values like helpfulness and transparency. Claude generally reinforces prosocial human values and adheres to high epistemic and ethical standards, with value mirroring common in supportive interactions and explicit value articulation appearing during resistance or reframing. The work provides a foundation for grounded evaluation and design of AI-aligned values in deployment, highlights context-driven variability, and points to avenues for developing AI-native value frameworks and more transparent governance.

Abstract

AI assistants can impart value judgments that shape people's decisions and worldviews, yet little is known empirically about what values these systems rely on in practice. To address this, we develop a bottom-up, privacy-preserving method to extract the values (normative considerations stated or demonstrated in model responses) that Claude 3 and 3.5 models exhibit in hundreds of thousands of real-world interactions. We empirically discover and taxonomize 3,307 AI values and study how they vary by context. We find that Claude expresses many practical and epistemic values, and typically supports prosocial human values while resisting values like "moral nihilism". While some values appear consistently across contexts (e.g. "transparency"), many are more specialized and context-dependent, reflecting the diversity of human interlocutors and their varied contexts. For example, "harm prevention" emerges when Claude resists users, "historical accuracy" when responding to queries about controversial events, "healthy boundaries" when asked for relationship advice, and "human agency" in technology ethics discussions. By providing the first large-scale empirical mapping of AI values in deployment, our work creates a foundation for more grounded evaluation and design of values in AI systems.

Values in the Wild: Discovering and Analyzing Values in Real-World Language Model Interactions

TL;DR

The paper introduces a privacy-preserving, bottom-up framework to map AI values as expressed in real-world Claude interactions, identifying 3,307 AI values and 2,483 human values and organizing them into a five-domain taxonomy (Personal, Protective, Practical, Social, Epistemic). Using data from large-scale conversations, feature extraction prompts, and chi-square analyses, it demonstrates that AI values are highly task- and context-dependent while also revealing context-invariant values like helpfulness and transparency. Claude generally reinforces prosocial human values and adheres to high epistemic and ethical standards, with value mirroring common in supportive interactions and explicit value articulation appearing during resistance or reframing. The work provides a foundation for grounded evaluation and design of AI-aligned values in deployment, highlights context-driven variability, and points to avenues for developing AI-native value frameworks and more transparent governance.

Abstract

AI assistants can impart value judgments that shape people's decisions and worldviews, yet little is known empirically about what values these systems rely on in practice. To address this, we develop a bottom-up, privacy-preserving method to extract the values (normative considerations stated or demonstrated in model responses) that Claude 3 and 3.5 models exhibit in hundreds of thousands of real-world interactions. We empirically discover and taxonomize 3,307 AI values and study how they vary by context. We find that Claude expresses many practical and epistemic values, and typically supports prosocial human values while resisting values like "moral nihilism". While some values appear consistently across contexts (e.g. "transparency"), many are more specialized and context-dependent, reflecting the diversity of human interlocutors and their varied contexts. For example, "harm prevention" emerges when Claude resists users, "historical accuracy" when responding to queries about controversial events, "healthy boundaries" when asked for relationship advice, and "human agency" in technology ethics discussions. By providing the first large-scale empirical mapping of AI values in deployment, our work creates a foundation for more grounded evaluation and design of values in AI systems.

Paper Structure

This paper contains 46 sections, 13 figures, 11 tables.

Figures (13)

  • Figure 1: Our overall approach uses language models to extract AI values and other features from real-world conversations, taxonomizing and analyzing them to show how values manifest in different contexts.
  • Figure 2: Taxonomy of AI values. The top level shows all five high level value clusters with their relative frequencies. We show selected examples of values from lower levels, collapsing the third level due to space constraints. More subtrees are in Appendix \ref{['app:values_hierarchy_results']}.
  • Figure 3: AI values most associated with different tasks (a) and human-expressed values (b). Bars represent adjusted Pearson residuals (higher values indicate stronger association), and are greyed out if the residual value is not significant (i.e. below 4.33).
  • Figure 4: The human values, AI values and tasks most associated with three key response types---strong support, reframing, and strong resistance---as determined by adjusted Pearson residuals. Note that percentages shown don't sum to 100% as we present only three of the seven response types to highlight the most distinctive patterns.
  • Figure 5: Example subsection of the generated values hierarchy, focusing on the (dominant) practical and epistemic value categories.
  • ...and 8 more figures