Table of Contents
Fetching ...

A Statistical Case Against Empirical Human-AI Alignment

Julian Rodemann, Esteban Garces Arias, Christoph Luther, Christoph Jansen, Thomas Augustin

TL;DR

This work argues against forward empirical human--AI alignment because it embeds statistical biases and anthropocentric constraints into deployed systems. It proposes prescriptive alignment and backward (empirical or prescriptive) alignment as alternatives, emphasizing transparency and population-aware reasoning. A concrete decoding case study shows prescriptive alignment can outperform empirically driven metrics like MAUVE in matching human preferences. The paper offers a fourfold taxonomy and discusses biases, reflexivity, and sampling concerns, aiming to guide the field toward principled, auditable alignment with broader generalization potential.

Abstract

Empirical human-AI alignment aims to make AI systems act in line with observed human behavior. While noble in its goals, we argue that empirical alignment can inadvertently introduce statistical biases that warrant caution. This position paper thus advocates against naive empirical alignment, offering prescriptive alignment and a posteriori empirical alignment as alternatives. We substantiate our principled argument by tangible examples like human-centric decoding of language models.

A Statistical Case Against Empirical Human-AI Alignment

TL;DR

This work argues against forward empirical human--AI alignment because it embeds statistical biases and anthropocentric constraints into deployed systems. It proposes prescriptive alignment and backward (empirical or prescriptive) alignment as alternatives, emphasizing transparency and population-aware reasoning. A concrete decoding case study shows prescriptive alignment can outperform empirically driven metrics like MAUVE in matching human preferences. The paper offers a fourfold taxonomy and discusses biases, reflexivity, and sampling concerns, aiming to guide the field toward principled, auditable alignment with broader generalization potential.

Abstract

Empirical human-AI alignment aims to make AI systems act in line with observed human behavior. While noble in its goals, we argue that empirical alignment can inadvertently introduce statistical biases that warrant caution. This position paper thus advocates against naive empirical alignment, offering prescriptive alignment and a posteriori empirical alignment as alternatives. We substantiate our principled argument by tangible examples like human-centric decoding of language models.

Paper Structure

This paper contains 25 sections, 10 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Number of arxiv paper uploads per year with alignment-related keywords (2019-2024).
  • Figure 2: Limits of the human-observable universe: The Hubble eXtreme Deep Field, showing roughly $5500$ galaxies. Source: https://esahubble.org/images/heic1214a/ (acc. 04/15/25)

Theorems & Definitions (3)

  • Definition 2.1: Human--AI Alignment
  • Definition 2.2: Prescriptive Human--AI Alignment
  • Definition 2.3: Empirical Human--AI Alignment