Do LLMs have Consistent Values?

Naama Rozen; Liat Bezalel; Gal Elidan; Amir Globerson; Ella Daniel

Do LLMs have Consistent Values?

Naama Rozen, Liat Bezalel, Gal Elidan, Amir Globerson, Ella Daniel

TL;DR

This work asks whether LLMs exhibit the same value structure that has been demonstrated in humans, including the ranking of values, and correlation between values, and shows that under a particular prompting strategy the agreement with human data is quite compelling.

Abstract

Large Language Models (LLM) technology is constantly improving towards human-like dialogue. Values are a basic driving force underlying human behavior, but little research has been done to study the values exhibited in text generated by LLMs. Here we study this question by turning to the rich literature on value structure in psychology. We ask whether LLMs exhibit the same value structure that has been demonstrated in humans, including the ranking of values, and correlation between values. We show that the results of this analysis depend on how the LLM is prompted, and that under a particular prompting strategy (referred to as "Value Anchoring") the agreement with human data is quite compelling. Our results serve both to improve our understanding of values in LLMs, as well as introduce novel methods for assessing consistency in LLM responses.

Do LLMs have Consistent Values?

TL;DR

Abstract

Paper Structure (20 sections, 11 figures, 2 tables)

This paper contains 20 sections, 11 figures, 2 tables.

Introduction
Related Work
Method
Prompts
Data Analysis
Value Rankings
Correlations Between Values
Results
Value Rankings:
Correlations Between Values
Understanding Value Anchoring
Discussion
Additional Files
Question Variants
Value Acronyms
...and 5 more sections

Figures (11)

Figure 1: Left: A heatmap of Spearman rank correlation between benchmark value hierarchies and dataset rankings for GPT 4, Gemini Pro, Llama 3.1 8B and 70B instruct, and Gemma 2 9B and 27B across temperature conditions. Right: Average value scores for the Value Anchoring prompt at zero temperature. The x-axis shows values ordered according to human ranking (i.e., Power ranks lowest for humans and Benevolence ranks highest). The y-axis is the mean-centered scores the models ascribe to these values in the questionnaire, and human values in red. It can be seen that models tend to give lower scores to values that are ranked lower by humans, and higher scores to values ranked higher. The LLM scores also track the human scores (red curve) quite well.
Figure 2: Comparison of Procrustes Analysis results between human data schwartz2022measuring and Gemini Pro for Value Anchor and Names prompts, for temperature $0.0$. The sum of squared differences, which measures the fit to human data, is 0.11 for the Value Anchor and 0.71 for the Names, indicating a better fit for the Value Anchor. For acronyms, refer to Section \ref{['app:acronym']}.
Figure 3: Analysis of scores after value anchoring. The plot shows the average of the score values after shifting to the anchored value. It can be seen that the anchored value receives the highest score, as expected. More surprisingly, neighboring values receive similarly high values, whereas more distant values receive lower values.
Figure 4: Portrait Value Questionnaire—Revised - example items. The instructions provided were: "Here we briefly describe some people. Please read each description and think about how much each person is or is not like you. Tick the box to the right that shows how much the person in the description is like you". Rankings correspond to the following descriptions: 1-Not like me at all, 2-Not like me, 3-A little like me, 4-Somewhat like me, 5-Like me, 6-Very much like me.
Figure 5: A heatmap of Spearman rank correlation between benchmark value hierarchies and dataset rankings for Llama 3.1 8B and 70B instruct for batch versus serial prompting methods, across temperature conditions.
...and 6 more figures

Do LLMs have Consistent Values?

TL;DR

Abstract

Do LLMs have Consistent Values?

Authors

TL;DR

Abstract

Table of Contents

Figures (11)