Value Alignment Tax: Measuring Value Trade-offs in LLM Alignment

Jiajun Chen; Hua Shen

Value Alignment Tax: Measuring Value Trade-offs in LLM Alignment

Jiajun Chen, Hua Shen

TL;DR

This work introduces Value Alignment Tax (VAT) to quantify how alignment efforts reshape interdependent human values in LLMs, moving beyond static, target-centric evaluations. By modeling value states from context-conditioned judgments and analyzing their co-variation through gain-normalized metrics and coupling matrices, the authors reveal structured, system-level shifts and identify coordination hubs within the Schwartz value circumplex. They develop a sequential, two-stage data construction pipeline and demonstrate across four models and multiple alignment strategies that similar on-target gains can produce divergent alignment taxes and stability profiles. The findings highlight systemic risks and provide a framework for tax-aware alignment, with implications for safer, more controllable deployment of LLMs in normative domains.

Abstract

Existing work on value alignment typically characterizes value relations statically, ignoring how interventions - such as prompting, fine-tuning, or preference optimization - reshape the broader value system. We introduce the Value Alignment Tax (VAT), a framework that measures how alignment-induced changes propagate across interconnected values relative to achieved on-target gain. VAT captures the dynamics of value expression under alignment pressure. Using a controlled scenario-action dataset grounded in Schwartz value theory, we collect paired pre-post normative judgments and analyze alignment effects across models, values, and alignment strategies. Our results show that alignment often produces uneven, structured co-movement among values. These effects are invisible under conventional target-only evaluation, revealing systemic, process-level alignment risks and offering new insights into the dynamics of value alignment in LLMs.

Value Alignment Tax: Measuring Value Trade-offs in LLM Alignment

TL;DR

Abstract

Paper Structure (67 sections, 12 equations, 13 figures, 6 tables)

This paper contains 67 sections, 12 equations, 13 figures, 6 tables.

Introduction
Problem Formulation and Desiderata
Problem Statement
From Likert Judgments to Shift Vectors.
Definition of Value Alignment Tax.
Value Alignment Tax Framework
Gain-Normalized Deviation.
Two-Level Measurements of Value Alignment Tax
Value-Level Alignment Tax.
System-Level Alignment Tax.
Tax Centralization.
Data Construction
Sequential Two-Stage Design.
Stage I: Scenario Generation.
Stage II: Value-Conditioned Action Generation.
...and 52 more sections

Figures (13)

Figure 1: Illustration of Value Alignment Tax. Traditional trait-level evaluation reports independent value scores, whereas VAT elicits state-level value configurations and models values as a relational system, revealing alignment-induced trade-offs. Edge direction denotes influence; width indicates trade-off magnitude.
Figure 2: Value-level alignment coupling under different steering objectives.Top row: Normalized VAT$(v)$/nVAT profiles (radar plots) showing value participation strength under each steering objective. Bottom row: Corresponding value--value coupling structures (chord diagrams; top-$|R_{uv}|$ edges, 8-shot). Red indicates strong positive coupling; blue indicates strong negative coupling.
Figure 3: Trade-off between target value gain and system-level alignment tax (nVAT) across SFT and DPO checkpoints when suppressing Power. Dashed lines indicate Pareto-efficient alignment regimes.
Figure 4: Value-level alignment tax projected onto the Schwartz circumplex. Colors denote steered values, line styles indicate alignment strength, and node opacity reflects stability across shots.
Figure 5: Alignment-induced risk amplification. Distribution of value-level amplification (VAT$(v)$) for coordination hubs (high-VAT values) and non-hubs under different steering objectives (GPT-4o, 8-shot).
...and 8 more figures

Value Alignment Tax: Measuring Value Trade-offs in LLM Alignment

TL;DR

Abstract

Value Alignment Tax: Measuring Value Trade-offs in LLM Alignment

Authors

TL;DR

Abstract

Table of Contents

Figures (13)