Table of Contents
Fetching ...

A Labeled Ophthalmic Ultrasound Dataset with Medical Report Generation Based on Cross-modal Deep Learning

Jing Wang, Junyan Fan, Meng Zhou, Yanzhu Zhang, Mingyu Shi

TL;DR

The paper addresses the bottleneck of interpreting ophthalmic ultrasound by constructing a labeled three-modal dataset that couples eye images, blood-flow indices, and Chinese text reports from 2,417 patients. It proposes a cross-modal memory network (CMN) built on ResNet-101 features and a Transformer-based encoder–decoder, with a shared memory to align image and text and handle Chinese terminology. Extensive experiments compare CMN to baselines (e.g., R2Gen) using standard NLG metrics, demonstrating superior performance and informative attention visualizations, while revealing practical challenges such as disease variability and language-specific processing. Overall, the dataset and CMN-based approach offer a path toward AI-assisted ophthalmic diagnosis and streamlined report generation, with potential impact on clinical workflows and ophthalmology research.

Abstract

Ultrasound imaging reveals eye morphology and aids in diagnosing and treating eye diseases. However, interpreting diagnostic reports requires specialized physicians. We present a labeled ophthalmic dataset for the precise analysis and the automated exploration of medical images along with their associated reports. It collects three modal data, including the ultrasound images, blood flow information and examination reports from 2,417 patients at an ophthalmology hospital in Shenyang, China, during the year 2018, in which the patient information is de-identified for privacy protection. To the best of our knowledge, it is the only ophthalmic dataset that contains the three modal information simultaneously. It incrementally consists of 4,858 images with the corresponding free-text reports, which describe 15 typical imaging findings of intraocular diseases and the corresponding anatomical locations. Each image shows three kinds of blood flow indices at three specific arteries, i.e., nine parameter values to describe the spectral characteristics of blood flow distribution. The reports were written by ophthalmologists during the clinical care. The proposed dataset is applied to generate medical report based on the cross-modal deep learning model. The experimental results demonstrate that our dataset is suitable for training supervised models concerning cross-modal medical data.

A Labeled Ophthalmic Ultrasound Dataset with Medical Report Generation Based on Cross-modal Deep Learning

TL;DR

The paper addresses the bottleneck of interpreting ophthalmic ultrasound by constructing a labeled three-modal dataset that couples eye images, blood-flow indices, and Chinese text reports from 2,417 patients. It proposes a cross-modal memory network (CMN) built on ResNet-101 features and a Transformer-based encoder–decoder, with a shared memory to align image and text and handle Chinese terminology. Extensive experiments compare CMN to baselines (e.g., R2Gen) using standard NLG metrics, demonstrating superior performance and informative attention visualizations, while revealing practical challenges such as disease variability and language-specific processing. Overall, the dataset and CMN-based approach offer a path toward AI-assisted ophthalmic diagnosis and streamlined report generation, with potential impact on clinical workflows and ophthalmology research.

Abstract

Ultrasound imaging reveals eye morphology and aids in diagnosing and treating eye diseases. However, interpreting diagnostic reports requires specialized physicians. We present a labeled ophthalmic dataset for the precise analysis and the automated exploration of medical images along with their associated reports. It collects three modal data, including the ultrasound images, blood flow information and examination reports from 2,417 patients at an ophthalmology hospital in Shenyang, China, during the year 2018, in which the patient information is de-identified for privacy protection. To the best of our knowledge, it is the only ophthalmic dataset that contains the three modal information simultaneously. It incrementally consists of 4,858 images with the corresponding free-text reports, which describe 15 typical imaging findings of intraocular diseases and the corresponding anatomical locations. Each image shows three kinds of blood flow indices at three specific arteries, i.e., nine parameter values to describe the spectral characteristics of blood flow distribution. The reports were written by ophthalmologists during the clinical care. The proposed dataset is applied to generate medical report based on the cross-modal deep learning model. The experimental results demonstrate that our dataset is suitable for training supervised models concerning cross-modal medical data.
Paper Structure (17 sections, 8 equations, 9 figures, 4 tables)

This paper contains 17 sections, 8 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: The framework for dataset constrcution and cross-modal generation
  • Figure 2: An example of complete image processing. (a) Screening (b) Regional selection and manual cropping. (c) Storage.
  • Figure 3: Example of report preprocessing.
  • Figure 4: Example of blood flow information extraction.
  • Figure 5: An example of complete data. (a) cropped image. (b) JSON-formatted content after extracting the text report. (c) extracted blood flow information.
  • ...and 4 more figures