Table of Contents
Fetching ...

Automated Retinal Image Analysis and Medical Report Generation through Deep Learning

Jia-Hong Huang

TL;DR

This work addresses the need to automate medical report generation from retinal images to alleviate ophthalmologist workload and improve diagnostic efficiency. It introduces a multi-stage, multi-modal deep learning framework comprising a retinal disease identifier (RDI), a clinical description generator (CDG), and a visual-explanation module, validated on the large DEN dataset. The thesis advances context-aware keyword representations, a non-local attention-based fusion (TransFuser), and expert-defined keywords to improve explainability and report quality, achieving state-of-the-art metrics on text-generation benchmarks. Collectively, these contributions offer a scalable, interpretable pathway to integrate AI-driven retinal analysis into clinical workflows, enhancing diagnostic accuracy, efficiency, and trust in automated medical reporting.

Abstract

The increasing prevalence of retinal diseases poses a significant challenge to the healthcare system, as the demand for ophthalmologists surpasses the available workforce. This imbalance creates a bottleneck in diagnosis and treatment, potentially delaying critical care. Traditional methods of generating medical reports from retinal images rely on manual interpretation, which is time-consuming and prone to errors, further straining ophthalmologists' limited resources. This thesis investigates the potential of Artificial Intelligence (AI) to automate medical report generation for retinal images. AI can quickly analyze large volumes of image data, identifying subtle patterns essential for accurate diagnosis. By automating this process, AI systems can greatly enhance the efficiency of retinal disease diagnosis, reducing doctors' workloads and enabling them to focus on more complex cases. The proposed AI-based methods address key challenges in automated report generation: (1) Improved methods for medical keyword representation enhance the system's ability to capture nuances in medical terminology; (2) A multi-modal deep learning approach captures interactions between textual keywords and retinal images, resulting in more comprehensive medical reports; (3) Techniques to enhance the interpretability of the AI-based report generation system, fostering trust and acceptance in clinical practice. These methods are rigorously evaluated using various metrics and achieve state-of-the-art performance. This thesis demonstrates AI's potential to revolutionize retinal disease diagnosis by automating medical report generation, ultimately improving clinical efficiency, diagnostic accuracy, and patient care. [https://github.com/Jhhuangkay/DeepOpht-Medical-Report-Generation-for-Retinal-Images-via-Deep-Models-and-Visual-Explanation]

Automated Retinal Image Analysis and Medical Report Generation through Deep Learning

TL;DR

This work addresses the need to automate medical report generation from retinal images to alleviate ophthalmologist workload and improve diagnostic efficiency. It introduces a multi-stage, multi-modal deep learning framework comprising a retinal disease identifier (RDI), a clinical description generator (CDG), and a visual-explanation module, validated on the large DEN dataset. The thesis advances context-aware keyword representations, a non-local attention-based fusion (TransFuser), and expert-defined keywords to improve explainability and report quality, achieving state-of-the-art metrics on text-generation benchmarks. Collectively, these contributions offer a scalable, interpretable pathway to integrate AI-driven retinal analysis into clinical workflows, enhancing diagnostic accuracy, efficiency, and trust in automated medical reporting.

Abstract

The increasing prevalence of retinal diseases poses a significant challenge to the healthcare system, as the demand for ophthalmologists surpasses the available workforce. This imbalance creates a bottleneck in diagnosis and treatment, potentially delaying critical care. Traditional methods of generating medical reports from retinal images rely on manual interpretation, which is time-consuming and prone to errors, further straining ophthalmologists' limited resources. This thesis investigates the potential of Artificial Intelligence (AI) to automate medical report generation for retinal images. AI can quickly analyze large volumes of image data, identifying subtle patterns essential for accurate diagnosis. By automating this process, AI systems can greatly enhance the efficiency of retinal disease diagnosis, reducing doctors' workloads and enabling them to focus on more complex cases. The proposed AI-based methods address key challenges in automated report generation: (1) Improved methods for medical keyword representation enhance the system's ability to capture nuances in medical terminology; (2) A multi-modal deep learning approach captures interactions between textual keywords and retinal images, resulting in more comprehensive medical reports; (3) Techniques to enhance the interpretability of the AI-based report generation system, fostering trust and acceptance in clinical practice. These methods are rigorously evaluated using various metrics and achieve state-of-the-art performance. This thesis demonstrates AI's potential to revolutionize retinal disease diagnosis by automating medical report generation, ultimately improving clinical efficiency, diagnostic accuracy, and patient care. [https://github.com/Jhhuangkay/DeepOpht-Medical-Report-Generation-for-Retinal-Images-via-Deep-Models-and-Visual-Explanation]
Paper Structure (80 sections, 31 equations, 21 figures, 13 tables)

This paper contains 80 sections, 31 equations, 21 figures, 13 tables.

Figures (21)

  • Figure 1: An overview of the thesis.
  • Figure 2: Figure (a) illustrates an existing traditional medical treatment procedure for retinal diseases tukey2014impact, where doctors are primarily responsible for most tasks. In contrast, Figure (b) integrates our AI-based medical diagnosis method into the traditional treatment process, aiming to enhance efficiency in line with the point-of-care (POC) concept pai2012point. The proposed method consists of DNN-based and DNN visual explanation modules. The outputs from the DNN-based module include the "Disease Class A" and "Clinical Description." Meanwhile, the DNN visual explanation module provides a visualization of the information generated by the DNN-based module for classification tasks. For a more detailed explanation, please refer to Section \ref{['me:method_2.4']}. Note that DNN stands for deep neural networks in this figure.
  • Figure 3: Illustration of the proposed AI-based medical diagnosis method for the ophthalmology domain. It comprises DNN-based and DNN visual explanation modules. The DNN-based module includes two sub-modules: a RDI and a CDG, reinforced by our proposed keyword-driven method, as detailed in Section \ref{['me:method_2.4']}. The input to our method is a retinal image, and the output is a table-based medical report zahalka2014towards. Figure \ref{['fig:figure_2_2']} demonstrates how this AI-based method can enhance the traditional retinal disease treatment process.
  • Figure 4: Examples from our DEN dataset. Each image is accompanied by three labels: the name of the disease, keywords, and a clinical description. All labels have been defined by ophthalmologists to ensure accuracy and relevance.
  • Figure 5: Illustration of the word length distribution for the keyword and clinical description labels. The majority of word lengths in our DEN dataset range between 5 and 10 words.
  • ...and 16 more figures