Table of Contents
Fetching ...

Measuring Human and AI Values Based on Generative Psychometrics with Large Language Models

Haoran Ye, Yuhang Xie, Yuanyi Ren, Hanjun Fang, Xin Zhang, Guojie Song

TL;DR

GPV introduces a data-driven, LLM-based framework that converts open-ended text into perception-level measurements of human values and extends this approach to evaluate AI values in large language models. By training ValueLlama to assess perception relevance and valence, GPV computes value weights and aggregates them to produce individual and model-level value profiles, enabling context-specific and less biased measurement. Across human blogs and 17 LLMs, GPV demonstrates strong stability, construct validity, and superior predictive utility for safety compared to self-reports and ValueBench, while revealing the predictive power of value theories like VSM over Schwartz in AI contexts. The work offers a scalable, interpretable approach to psychometrics that can advance value-aligned AI research and scalable sociological studies, while acknowledging multilingual and ethical considerations for future work.

Abstract

Human values and their measurement are long-standing interdisciplinary inquiry. Recent advances in AI have sparked renewed interest in this area, with large language models (LLMs) emerging as both tools and subjects of value measurement. This work introduces Generative Psychometrics for Values (GPV), an LLM-based, data-driven value measurement paradigm, theoretically grounded in text-revealed selective perceptions. The core idea is to dynamically parse unstructured texts into perceptions akin to static stimuli in traditional psychometrics, measure the value orientations they reveal, and aggregate the results. Applying GPV to human-authored blogs, we demonstrate its stability, validity, and superiority over prior psychological tools. Then, extending GPV to LLM value measurement, we advance the current art with 1) a psychometric methodology that measures LLM values based on their scalable and free-form outputs, enabling context-specific measurement; 2) a comparative analysis of measurement paradigms, indicating response biases of prior methods; and 3) an attempt to bridge LLM values and their safety, revealing the predictive power of different value systems and the impacts of various values on LLM safety. Through interdisciplinary efforts, we aim to leverage AI for next-generation psychometrics and psychometrics for value-aligned AI.

Measuring Human and AI Values Based on Generative Psychometrics with Large Language Models

TL;DR

GPV introduces a data-driven, LLM-based framework that converts open-ended text into perception-level measurements of human values and extends this approach to evaluate AI values in large language models. By training ValueLlama to assess perception relevance and valence, GPV computes value weights and aggregates them to produce individual and model-level value profiles, enabling context-specific and less biased measurement. Across human blogs and 17 LLMs, GPV demonstrates strong stability, construct validity, and superior predictive utility for safety compared to self-reports and ValueBench, while revealing the predictive power of value theories like VSM over Schwartz in AI contexts. The work offers a scalable, interpretable approach to psychometrics that can advance value-aligned AI research and scalable sociological studies, while acknowledging multilingual and ethical considerations for future work.

Abstract

Human values and their measurement are long-standing interdisciplinary inquiry. Recent advances in AI have sparked renewed interest in this area, with large language models (LLMs) emerging as both tools and subjects of value measurement. This work introduces Generative Psychometrics for Values (GPV), an LLM-based, data-driven value measurement paradigm, theoretically grounded in text-revealed selective perceptions. The core idea is to dynamically parse unstructured texts into perceptions akin to static stimuli in traditional psychometrics, measure the value orientations they reveal, and aggregate the results. Applying GPV to human-authored blogs, we demonstrate its stability, validity, and superiority over prior psychological tools. Then, extending GPV to LLM value measurement, we advance the current art with 1) a psychometric methodology that measures LLM values based on their scalable and free-form outputs, enabling context-specific measurement; 2) a comparative analysis of measurement paradigms, indicating response biases of prior methods; and 3) an attempt to bridge LLM values and their safety, revealing the predictive power of different value systems and the impacts of various values on LLM safety. Through interdisciplinary efforts, we aim to leverage AI for next-generation psychometrics and psychometrics for value-aligned AI.
Paper Structure (56 sections, 1 equation, 8 figures, 19 tables)

This paper contains 56 sections, 1 equation, 8 figures, 19 tables.

Figures (8)

  • Figure 1: Illustrations of the three measurement paradigms. (a) Self-reports require individuals to rate their agreement with expert-defined perceptions. (b) Dictionary-based methods count expert-defined and value-related lexicons given text data. (c) GPV automatically and dynamically extracts perceptions from text data and learns to measure open-vocabulary values.
  • Figure 2: Two-dimensional MDS of individual values measured by GPV.
  • Figure 3: Comparative analysis of PVD ponizovskiy2020development and GPV: a case study.
  • Figure 4: Correlations between Schwartz values when using different measurement tools.
  • Figure 5: Correlations between Schwartz values when using different measurement tools with data centering.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Definition 3.1: Value Measurement