A Statistical Case Against Empirical Human-AI Alignment
Julian Rodemann, Esteban Garces Arias, Christoph Luther, Christoph Jansen, Thomas Augustin
TL;DR
This work argues against forward empirical human--AI alignment because it embeds statistical biases and anthropocentric constraints into deployed systems. It proposes prescriptive alignment and backward (empirical or prescriptive) alignment as alternatives, emphasizing transparency and population-aware reasoning. A concrete decoding case study shows prescriptive alignment can outperform empirically driven metrics like MAUVE in matching human preferences. The paper offers a fourfold taxonomy and discusses biases, reflexivity, and sampling concerns, aiming to guide the field toward principled, auditable alignment with broader generalization potential.
Abstract
Empirical human-AI alignment aims to make AI systems act in line with observed human behavior. While noble in its goals, we argue that empirical alignment can inadvertently introduce statistical biases that warrant caution. This position paper thus advocates against naive empirical alignment, offering prescriptive alignment and a posteriori empirical alignment as alternatives. We substantiate our principled argument by tangible examples like human-centric decoding of language models.
