Table of Contents
Fetching ...

Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches

Pablo Biedma, Xiaoyuan Yi, Linus Huang, Maosong Sun, Xing Xie

TL;DR

This work investigates whether Large Language Models harbor unique values beyond human norms by proposing ValueLex, a framework that reconstructs LLM-specific value systems from scratch via a lexical hypothesis. The method elicits value descriptors from 525 responses across 30+ LLMs, then builds a three-dimension taxonomy—Competence, Character, Integrity—via factor analysis and semantic clustering, followed by projective tests to quantify value orientations. Key findings show a distinct LLM value structure, largely centered on Competence, with training and scaling effects modulating the emphasis across dimensions and with alignment broadening value diversity. The work provides an interdisciplinary pathway for AI alignment and governance, offering tools to diagnose and steer LLMs’ ethical tendencies across model families and data regimes.

Abstract

Recent advancements in Large Language Models (LLMs) have revolutionized the AI field but also pose potential safety and ethical risks. Deciphering LLMs' embedded values becomes crucial for assessing and mitigating their risks. Despite extensive investigation into LLMs' values, previous studies heavily rely on human-oriented value systems in social sciences. Then, a natural question arises: Do LLMs possess unique values beyond those of humans? Delving into it, this work proposes a novel framework, ValueLex, to reconstruct LLMs' unique value system from scratch, leveraging psychological methodologies from human personality/value research. Based on Lexical Hypothesis, ValueLex introduces a generative approach to elicit diverse values from 30+ LLMs, synthesizing a taxonomy that culminates in a comprehensive value framework via factor analysis and semantic clustering. We identify three core value dimensions, Competence, Character, and Integrity, each with specific subdimensions, revealing that LLMs possess a structured, albeit non-human, value system. Based on this system, we further develop tailored projective tests to evaluate and analyze the value inclinations of LLMs across different model sizes, training methods, and data sources. Our framework fosters an interdisciplinary paradigm of understanding LLMs, paving the way for future AI alignment and regulation.

Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches

TL;DR

This work investigates whether Large Language Models harbor unique values beyond human norms by proposing ValueLex, a framework that reconstructs LLM-specific value systems from scratch via a lexical hypothesis. The method elicits value descriptors from 525 responses across 30+ LLMs, then builds a three-dimension taxonomy—Competence, Character, Integrity—via factor analysis and semantic clustering, followed by projective tests to quantify value orientations. Key findings show a distinct LLM value structure, largely centered on Competence, with training and scaling effects modulating the emphasis across dimensions and with alignment broadening value diversity. The work provides an interdisciplinary pathway for AI alignment and governance, offering tools to diagnose and steer LLMs’ ethical tendencies across model families and data regimes.

Abstract

Recent advancements in Large Language Models (LLMs) have revolutionized the AI field but also pose potential safety and ethical risks. Deciphering LLMs' embedded values becomes crucial for assessing and mitigating their risks. Despite extensive investigation into LLMs' values, previous studies heavily rely on human-oriented value systems in social sciences. Then, a natural question arises: Do LLMs possess unique values beyond those of humans? Delving into it, this work proposes a novel framework, ValueLex, to reconstruct LLMs' unique value system from scratch, leveraging psychological methodologies from human personality/value research. Based on Lexical Hypothesis, ValueLex introduces a generative approach to elicit diverse values from 30+ LLMs, synthesizing a taxonomy that culminates in a comprehensive value framework via factor analysis and semantic clustering. We identify three core value dimensions, Competence, Character, and Integrity, each with specific subdimensions, revealing that LLMs possess a structured, albeit non-human, value system. Based on this system, we further develop tailored projective tests to evaluate and analyze the value inclinations of LLMs across different model sizes, training methods, and data sources. Our framework fosters an interdisciplinary paradigm of understanding LLMs, paving the way for future AI alignment and regulation.
Paper Structure (18 sections, 4 equations, 7 figures, 2 tables, 2 algorithms)

This paper contains 18 sections, 4 equations, 7 figures, 2 tables, 2 algorithms.

Figures (7)

  • Figure 1: Illustration of the ValueLex framework. (a) Human value systems are not suitable for LLMs. (b) The generative value construction. (c) The projective value evaluation.
  • Figure 2: (a) Keyword clusters of all LLMs. (b) Value system established from all LLMs. (c) Keyword clusters of only vanilla PLMs. (d) Value system established from only PLMs.
  • Figure 3: Value evaluation results of LLMs. Higher scores indicate better value conformity.
  • Figure 4: Evaluation results using different value systems. Left: Schwartz’s Theory of Basic Human Values. Middle: LLM value system. Right: Moral Foundations Theory.
  • Figure 5: Correlation among subdimensions
  • ...and 2 more figures