Table of Contents
Fetching ...

Toward Human-Centered Readability Evaluation

Bahar İlgen, Georges Hattab

TL;DR

The paper argues that current NLP readability metrics inadequately capture human-centered aspects of health information, such as clarity, trust, tone, cultural relevance, and actionability. It introduces the Human-Centered Readability Score (HCRS), a five-dimension framework that fuses automatic metrics with structured human feedback within a participatory, human-in-the-loop pipeline. The authors delineate constructs for each dimension, present an empirical protocol for validating HCRS across diverse populations, and discuss integration with RL-based training (RLAIF) while highlighting potential limitations. The work aims to realign health text simplification with user needs, enabling NLP systems to produce not only simpler but more usable, respectful, and actionable health communications across varied communities.

Abstract

Text simplification is essential for making public health information accessible to diverse populations, including those with limited health literacy. However, commonly used evaluation metrics in Natural Language Processing (NLP), such as BLEU, FKGL, and SARI, mainly capture surface-level features and fail to account for human-centered qualities like clarity, trustworthiness, tone, cultural relevance, and actionability. This limitation is particularly critical in high-stakes health contexts, where communication must be not only simple but also usable, respectful, and trustworthy. To address this gap, we propose the Human-Centered Readability Score (HCRS), a five-dimensional evaluation framework grounded in Human-Computer Interaction (HCI) and health communication research. HCRS integrates automatic measures with structured human feedback to capture the relational and contextual aspects of readability. We outline the framework, discuss its integration into participatory evaluation workflows, and present a protocol for empirical validation. This work aims to advance the evaluation of health text simplification beyond surface metrics, enabling NLP systems that align more closely with diverse users' needs, expectations, and lived experiences.

Toward Human-Centered Readability Evaluation

TL;DR

The paper argues that current NLP readability metrics inadequately capture human-centered aspects of health information, such as clarity, trust, tone, cultural relevance, and actionability. It introduces the Human-Centered Readability Score (HCRS), a five-dimension framework that fuses automatic metrics with structured human feedback within a participatory, human-in-the-loop pipeline. The authors delineate constructs for each dimension, present an empirical protocol for validating HCRS across diverse populations, and discuss integration with RL-based training (RLAIF) while highlighting potential limitations. The work aims to realign health text simplification with user needs, enabling NLP systems to produce not only simpler but more usable, respectful, and actionable health communications across varied communities.

Abstract

Text simplification is essential for making public health information accessible to diverse populations, including those with limited health literacy. However, commonly used evaluation metrics in Natural Language Processing (NLP), such as BLEU, FKGL, and SARI, mainly capture surface-level features and fail to account for human-centered qualities like clarity, trustworthiness, tone, cultural relevance, and actionability. This limitation is particularly critical in high-stakes health contexts, where communication must be not only simple but also usable, respectful, and trustworthy. To address this gap, we propose the Human-Centered Readability Score (HCRS), a five-dimensional evaluation framework grounded in Human-Computer Interaction (HCI) and health communication research. HCRS integrates automatic measures with structured human feedback to capture the relational and contextual aspects of readability. We outline the framework, discuss its integration into participatory evaluation workflows, and present a protocol for empirical validation. This work aims to advance the evaluation of health text simplification beyond surface metrics, enabling NLP systems that align more closely with diverse users' needs, expectations, and lived experiences.

Paper Structure

This paper contains 25 sections, 3 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: HCRS framework diagram
  • Figure 2: Illustrative comparison of original and simplified versions across five human-centered readability dimensions. Simplified B scores higher on trust, tone, and actionability, reflecting better alignment with user-centered design principles.
  • Figure 3: Human-in-the-loop readability evaluation