Measuring Human and AI Values Based on Generative Psychometrics with Large Language Models
Haoran Ye, Yuhang Xie, Yuanyi Ren, Hanjun Fang, Xin Zhang, Guojie Song
TL;DR
GPV introduces a data-driven, LLM-based framework that converts open-ended text into perception-level measurements of human values and extends this approach to evaluate AI values in large language models. By training ValueLlama to assess perception relevance and valence, GPV computes value weights and aggregates them to produce individual and model-level value profiles, enabling context-specific and less biased measurement. Across human blogs and 17 LLMs, GPV demonstrates strong stability, construct validity, and superior predictive utility for safety compared to self-reports and ValueBench, while revealing the predictive power of value theories like VSM over Schwartz in AI contexts. The work offers a scalable, interpretable approach to psychometrics that can advance value-aligned AI research and scalable sociological studies, while acknowledging multilingual and ethical considerations for future work.
Abstract
Human values and their measurement are long-standing interdisciplinary inquiry. Recent advances in AI have sparked renewed interest in this area, with large language models (LLMs) emerging as both tools and subjects of value measurement. This work introduces Generative Psychometrics for Values (GPV), an LLM-based, data-driven value measurement paradigm, theoretically grounded in text-revealed selective perceptions. The core idea is to dynamically parse unstructured texts into perceptions akin to static stimuli in traditional psychometrics, measure the value orientations they reveal, and aggregate the results. Applying GPV to human-authored blogs, we demonstrate its stability, validity, and superiority over prior psychological tools. Then, extending GPV to LLM value measurement, we advance the current art with 1) a psychometric methodology that measures LLM values based on their scalable and free-form outputs, enabling context-specific measurement; 2) a comparative analysis of measurement paradigms, indicating response biases of prior methods; and 3) an attempt to bridge LLM values and their safety, revealing the predictive power of different value systems and the impacts of various values on LLM safety. Through interdisciplinary efforts, we aim to leverage AI for next-generation psychometrics and psychometrics for value-aligned AI.
