Table of Contents
Fetching ...

Benchmarking Automated Clinical Language Simplification: Dataset, Algorithm, and Evaluation

Junyu Luo, Zifei Zheng, Hanzhong Ye, Muchao Ye, Yaqing Wang, Quanzeng You, Cao Xiao, Fenglong Ma

TL;DR

Health literacy barriers require reliable methods to simplify clinical language. This work introduces MedLane, a public benchmark dataset with aligned sentence and term-level annotations drawn from de-identified MIMIC-III notes, and DECLARE, a three-module architecture (CWL, DNLI, RSSP) that localizes complex terms, substitutes them via a dictionary-driven lexical interpreter, and polishes syntax for readability. The paper defines three evaluation metrics and demonstrates that DECLARE outperforms nine strong baselines across traditional MT metrics and task-specific measures, validating the dataset and approach. By enabling accurate term translation alongside sentence-level readability, MedLane and DECLARE have potential to improve patient comprehension, informatics tools, and health communication research. The work thus provides a practical benchmark and a robust method for advancing automated clinical language simplification.

Abstract

Patients with low health literacy usually have difficulty understanding medical jargon and the complex structure of professional medical language. Although some studies are proposed to automatically translate expert language into layperson-understandable language, only a few of them focus on both accuracy and readability aspects simultaneously in the clinical domain. Thus, simplification of the clinical language is still a challenging task, but unfortunately, it is not yet fully addressed in previous work. To benchmark this task, we construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches. Besides, we propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance compared with eight strong baselines. To fairly evaluate the performance, we also propose three specific evaluation metrics. Experimental results demonstrate the utility of the annotated MedLane dataset and the effectiveness of the proposed model DECLARE.

Benchmarking Automated Clinical Language Simplification: Dataset, Algorithm, and Evaluation

TL;DR

Health literacy barriers require reliable methods to simplify clinical language. This work introduces MedLane, a public benchmark dataset with aligned sentence and term-level annotations drawn from de-identified MIMIC-III notes, and DECLARE, a three-module architecture (CWL, DNLI, RSSP) that localizes complex terms, substitutes them via a dictionary-driven lexical interpreter, and polishes syntax for readability. The paper defines three evaluation metrics and demonstrates that DECLARE outperforms nine strong baselines across traditional MT metrics and task-specific measures, validating the dataset and approach. By enabling accurate term translation alongside sentence-level readability, MedLane and DECLARE have potential to improve patient comprehension, informatics tools, and health communication research. The work thus provides a practical benchmark and a robust method for advancing automated clinical language simplification.

Abstract

Patients with low health literacy usually have difficulty understanding medical jargon and the complex structure of professional medical language. Although some studies are proposed to automatically translate expert language into layperson-understandable language, only a few of them focus on both accuracy and readability aspects simultaneously in the clinical domain. Thus, simplification of the clinical language is still a challenging task, but unfortunately, it is not yet fully addressed in previous work. To benchmark this task, we construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches. Besides, we propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance compared with eight strong baselines. To fairly evaluate the performance, we also propose three specific evaluation metrics. Experimental results demonstrate the utility of the annotated MedLane dataset and the effectiveness of the proposed model DECLARE.

Paper Structure

This paper contains 27 sections, 7 equations, 5 figures, 6 tables, 3 algorithms.

Figures (5)

  • Figure 1: An example of annotating a sentence by a worker using two steps, i.e., rephrasing and simplifying. In the rephrasing step, three abbreviations are replaced by full forms. In the simplifying step, the full form "nasal cannula" is replaced by "tube insertion on nose".
  • Figure 2: Overview of the proposed Declare model.
  • Figure 3: Example of the failure of existing metrics.
  • Figure 4: Sentence length v.s. performance.
  • Figure 5: Ascore changes regrading $\alpha$ and $\beta$.