HEDS 3.0: The Human Evaluation Data Sheet Version 3.0
Anya Belz, Craig Thomson
TL;DR
HEDS 3.0 delivers a standardized datasheet template for documenting human evaluation experiments in NLP, addressing reproducibility and cross-study comparability. The paper articulates a five-section structure, links core questions to established practices, and provides a complete software package (online form, guidance, and LaTeX export) to streamline reporting. By mapping quality criteria to the QCET taxonomy and detailing preregistration and ethics considerations, it enables rigorous, transparent evaluation reporting. The practical impact is improved reproducibility, easier preregistration, and more consistent, machine-readable documentation of human evaluation studies in NLP.
Abstract
This paper presents version 3.0 of the Human Evaluation Datasheet (HEDS). This update is the result of our experience using HEDS in the context of numerous recent human evaluation experiments, including reproduction studies, and of feedback received. Our main overall goal was to improve clarity, and to enable users to complete the datasheet more consistently and comparably. The HEDS 3.0 package consists of the digital data sheet, documentation, and code for exporting completed data sheets as latex files, all available from the HEDS GitHub.
