How Humans Help LLMs: Assessing and Incentivizing Human Preference Annotators

Shang Liu; Hanzhao Wang; Zhongyao Ma; Xiaocheng Li

How Humans Help LLMs: Assessing and Incentivizing Human Preference Annotators

Shang Liu, Hanzhao Wang, Zhongyao Ma, Xiaocheng Li

TL;DR

This work addresses how to assess and incentivize human annotators for language preference data used in LLM alignment, focusing on the challenges of annotator heterogeneity and the unclear link between annotation quality and downstream performance. It develops a probabilistic annotator model and a principal-agent framework with continuous action spaces, proposing two assessment methods—self-consistency monitoring and expert-based monitoring—and two contract forms—binary and linear. Theoretical results establish convergence gaps: $Θ(1/\sqrt{n\log n})$ for binary contracts and $Θ(1/n)$ for linear contracts, showing that self-consistency monitoring outperforms expert-based monitoring under broad conditions. Empirical analysis on real preference datasets supports the theoretical claims and demonstrates practical advantages for self-consistency monitoring and linear incentive schemes in improving data quality for RLHF/DPO-style alignment tasks.

Abstract

Human-annotated preference data play an important role in aligning large language models (LLMs). In this paper, we investigate the questions of assessing the performance of human annotators and incentivizing them to provide high-quality annotations. The quality assessment of language/text annotation faces two challenges: (i) the intrinsic heterogeneity among annotators, which prevents the classic methods that assume the underlying existence of a true label; and (ii) the unclear relationship between the annotation quality and the performance of downstream tasks, which excludes the possibility of inferring the annotators' behavior based on the model performance trained from the annotation data. Then we formulate a principal-agent model to characterize the behaviors of and the interactions between the company and the human annotators. The model rationalizes a practical mechanism of a bonus scheme to incentivize annotators which benefits both parties and it underscores the importance of the joint presence of an assessment system and a proper contract scheme. From a technical perspective, our analysis extends the existing literature on the principal-agent model by considering a continuous action space for the agent. We show the gap between the first-best and the second-best solutions (under the continuous action space) is of $Θ(1/\sqrt{n \log n})$ for the binary contracts and $Θ(1/n)$ for the linear contracts, where $n$ is the number of samples used for performance assessment; this contrasts with the known result of $\exp(-Θ(n))$ for the binary contracts when the action space is discrete. Throughout the paper, we use real preference annotation data to accompany our discussions.

How Humans Help LLMs: Assessing and Incentivizing Human Preference Annotators

TL;DR

Abstract

How Humans Help LLMs: Assessing and Incentivizing Human Preference Annotators

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (25)