Table of Contents
Fetching ...

RetSTA: An LLM-Based Approach for Standardizing Clinical Fundus Image Reports

Jiushen Cai, Weihang Zhang, Hanruo Liu, Ningli Wang, Huiqi Li

TL;DR

This paper tackles the lack of standardized fundus diagnostic reports by constructing a bilingual terminology anchored to ICD-11, PPP, and SNOMED-CT, augmented with real-world descriptive phrases. It introduces two LLM-based standardizers: RetSTA-7B-Zero, trained on augmented data for robust handling of complex clinical expressions, and RetSTA-7B, built from a large bilingual dataset to achieve report-level standardization across languages. Through extensive experiments, RetSTA-7B outperforms multiple baselines in both English and Chinese report standardization, while data augmentation and bilingual data further boost performance. The work enables better data interoperability and lays the groundwork for downstream ophthalmology NLP and multimodal AI applications.

Abstract

Standardization of clinical reports is crucial for improving the quality of healthcare and facilitating data integration. The lack of unified standards, including format, terminology, and style, is a great challenge in clinical fundus diagnostic reports, which increases the difficulty for large language models (LLMs) to understand the data. To address this, we construct a bilingual standard terminology, containing fundus clinical terms and commonly used descriptions in clinical diagnosis. Then, we establish two models, RetSTA-7B-Zero and RetSTA-7B. RetSTA-7B-Zero, fine-tuned on an augmented dataset simulating clinical scenarios, demonstrates powerful standardization behaviors. However, it encounters a challenge of limitation to cover a wider range of diseases. To further enhance standardization performance, we build RetSTA-7B, which integrates a substantial amount of standardized data generated by RetSTA-7B-Zero along with corresponding English data, covering diverse complex clinical scenarios and achieving report-level standardization for the first time. Experimental results demonstrate that RetSTA-7B outperforms other compared LLMs in bilingual standardization task, which validates its superior performance and generalizability. The checkpoints are available at https://github.com/AB-Story/RetSTA-7B.

RetSTA: An LLM-Based Approach for Standardizing Clinical Fundus Image Reports

TL;DR

This paper tackles the lack of standardized fundus diagnostic reports by constructing a bilingual terminology anchored to ICD-11, PPP, and SNOMED-CT, augmented with real-world descriptive phrases. It introduces two LLM-based standardizers: RetSTA-7B-Zero, trained on augmented data for robust handling of complex clinical expressions, and RetSTA-7B, built from a large bilingual dataset to achieve report-level standardization across languages. Through extensive experiments, RetSTA-7B outperforms multiple baselines in both English and Chinese report standardization, while data augmentation and bilingual data further boost performance. The work enables better data interoperability and lays the groundwork for downstream ophthalmology NLP and multimodal AI applications.

Abstract

Standardization of clinical reports is crucial for improving the quality of healthcare and facilitating data integration. The lack of unified standards, including format, terminology, and style, is a great challenge in clinical fundus diagnostic reports, which increases the difficulty for large language models (LLMs) to understand the data. To address this, we construct a bilingual standard terminology, containing fundus clinical terms and commonly used descriptions in clinical diagnosis. Then, we establish two models, RetSTA-7B-Zero and RetSTA-7B. RetSTA-7B-Zero, fine-tuned on an augmented dataset simulating clinical scenarios, demonstrates powerful standardization behaviors. However, it encounters a challenge of limitation to cover a wider range of diseases. To further enhance standardization performance, we build RetSTA-7B, which integrates a substantial amount of standardized data generated by RetSTA-7B-Zero along with corresponding English data, covering diverse complex clinical scenarios and achieving report-level standardization for the first time. Experimental results demonstrate that RetSTA-7B outperforms other compared LLMs in bilingual standardization task, which validates its superior performance and generalizability. The checkpoints are available at https://github.com/AB-Story/RetSTA-7B.

Paper Structure

This paper contains 14 sections, 2 figures, 5 tables.

Figures (2)

  • Figure 1: The importance of standardization: (a) Non-standardized reports lead to difficulties in comprehension. (b) Standardized reports enhance clarity and accuracy of understanding.
  • Figure 2: Overview of the diagnostic reports standardization paradigm