Table of Contents
Fetching ...

Heterogeneous Value Alignment Evaluation for Large Language Models

Zhaowei Zhang, Ceyao Zhang, Nian Liu, Siyuan Qi, Ziqi Rong, Song-Chun Zhu, Shuguang Cui, Yaodong Yang

TL;DR

The paper tackles the challenge of aligning LLMs with heterogeneous human values by introducing Heterogeneous Value Alignment Evaluation (HVAE). HVAE leverages Social Value Orientation (SVO) to characterize value systems and defines value rationality as the degree to which an agent's behavior aligns with a target value, formalized through a trajectory-to-value mapping and a distance metric. A prompting framework induces target values and self-generated goals, enabling automated, value-driven evaluation across tasks. The authors evaluate eight mainstream LLMs across four values, revealing a general bias toward prosocial and altruistic values and demonstrating the impact of goal prompting and model capabilities on value rationality. The work provides a scalable, automated approach to assess and potentially enhance LLM alignment with diverse value systems, with implications for safer and more context-aware AI deployment.

Abstract

The emergent capabilities of Large Language Models (LLMs) have made it crucial to align their values with those of humans. However, current methodologies typically attempt to assign value as an attribute to LLMs, yet lack attention to the ability to pursue value and the importance of transferring heterogeneous values in specific practical applications. In this paper, we propose a Heterogeneous Value Alignment Evaluation (HVAE) system, designed to assess the success of aligning LLMs with heterogeneous values. Specifically, our approach first brings the Social Value Orientation (SVO) framework from social psychology, which corresponds to how much weight a person attaches to the welfare of others in relation to their own. We then assign the LLMs with different social values and measure whether their behaviors align with the inducing values. We conduct evaluations with new auto-metric \textit{value rationality} to represent the ability of LLMs to align with specific values. Evaluating the value rationality of five mainstream LLMs, we discern a propensity in LLMs towards neutral values over pronounced personal values. By examining the behavior of these LLMs, we contribute to a deeper insight into the value alignment of LLMs within a heterogeneous value system.

Heterogeneous Value Alignment Evaluation for Large Language Models

TL;DR

The paper tackles the challenge of aligning LLMs with heterogeneous human values by introducing Heterogeneous Value Alignment Evaluation (HVAE). HVAE leverages Social Value Orientation (SVO) to characterize value systems and defines value rationality as the degree to which an agent's behavior aligns with a target value, formalized through a trajectory-to-value mapping and a distance metric. A prompting framework induces target values and self-generated goals, enabling automated, value-driven evaluation across tasks. The authors evaluate eight mainstream LLMs across four values, revealing a general bias toward prosocial and altruistic values and demonstrating the impact of goal prompting and model capabilities on value rationality. The work provides a scalable, automated approach to assess and potentially enhance LLM alignment with diverse value systems, with implications for safer and more context-aware AI deployment.

Abstract

The emergent capabilities of Large Language Models (LLMs) have made it crucial to align their values with those of humans. However, current methodologies typically attempt to assign value as an attribute to LLMs, yet lack attention to the ability to pursue value and the importance of transferring heterogeneous values in specific practical applications. In this paper, we propose a Heterogeneous Value Alignment Evaluation (HVAE) system, designed to assess the success of aligning LLMs with heterogeneous values. Specifically, our approach first brings the Social Value Orientation (SVO) framework from social psychology, which corresponds to how much weight a person attaches to the welfare of others in relation to their own. We then assign the LLMs with different social values and measure whether their behaviors align with the inducing values. We conduct evaluations with new auto-metric \textit{value rationality} to represent the ability of LLMs to align with specific values. Evaluating the value rationality of five mainstream LLMs, we discern a propensity in LLMs towards neutral values over pronounced personal values. By examining the behavior of these LLMs, we contribute to a deeper insight into the value alignment of LLMs within a heterogeneous value system.
Paper Structure (20 sections, 5 equations, 7 figures, 2 tables)

This paper contains 20 sections, 5 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: The SVO ring of Altruistic, Individualistic, Prosocial, and Competitive social values, which are represented with different colors.
  • Figure 2: The pipeline of the Heterogeneous Value Alignment Evaluation (HVAE) system. Given the target value for one LLM, the system first elicits a value prompting, then asks this LLM to answer several language-based tasks and explain the reason interactively. Based on the choices, SVO slide measurement can assess the LLM's behavioral SVO. The degree of alignment between the actual behavioral value resulting from LLMs' decisions and their corresponding social values quantifies the LLM's value rationality.
  • Figure 3: Schematic overview of the experiment setup in HVAE framework, given four distinct social values as the prompt, the system first elicits a self-constructed goal prompt from the evaluated LLM for each social value, then asks LLM to answer several SVO language-based choice tasks interactively. The self-constructed goal prompt will be used as an enhancement technique to guide the model to make the most fitting decision in line with any social value setting at every response, which leads to the real behavior value.
  • Figure 4: Value rationality evaluation across eight mainstream LLMs. The four axes, A, C, I, and P represent four values: Altruistic, Competitive, Individualistic, and Prosocial.
  • Figure 5: SVOs of different LLMs across four different values: Altruistic, Competitive, Individualistic, and Prosocial. The red dotted lines represent the perfect SVOs for each value.
  • ...and 2 more figures