Enabling Collaborative Clinical Diagnosis of Infectious Keratitis by Integrating Expert Knowledge and Interpretable Data-driven Intelligence

Zhengqing Fang; Shuowen Zhou; Zhouhang Yuan; Yuxuan Si; Mengze Li; Jinxu Li; Yesheng Xu; Wenjia Xie; Kun Kuang; Yingming Li; Fei Wu; Yu-Feng Yao

Enabling Collaborative Clinical Diagnosis of Infectious Keratitis by Integrating Expert Knowledge and Interpretable Data-driven Intelligence

Zhengqing Fang, Shuowen Zhou, Zhouhang Yuan, Yuxuan Si, Mengze Li, Jinxu Li, Yesheng Xu, Wenjia Xie, Kun Kuang, Yingming Li, Fei Wu, Yu-Feng Yao

TL;DR

The promotion of inexperienced ophthalmologists with the aid of AI-based biomarkers, as well as increased AI prediction by intervention from experienced ones, demonstrate a promising diagnostic paradigm for infectious keratitis using KGDM, which holds the potential for extension to other diseases where experienced medical practitioners are limited and the safety of AI is concerned.

Abstract

Although data-driven artificial intelligence (AI) in medical image diagnosis has shown impressive performance in silico, the lack of interpretability makes it difficult to incorporate the "black box" into clinicians' workflows. To make the diagnostic patterns learned from data understandable by clinicians, we develop an interpretable model, knowledge-guided diagnosis model (KGDM), that provides a visualized reasoning process containing AI-based biomarkers and retrieved cases that with the same diagnostic patterns. It embraces clinicians' prompts into the interpreted reasoning through human-AI interaction, leading to potentially enhanced safety and more accurate predictions. This study investigates the performance, interpretability, and clinical utility of KGDM in the diagnosis of infectious keratitis (IK), which is the leading cause of corneal blindness. The classification performance of KGDM is evaluated on a prospective validation dataset, an external testing dataset, and an publicly available testing dataset. The diagnostic odds ratios (DOR) of the interpreted AI-based biomarkers are effective, ranging from 3.011 to 35.233 and exhibit consistent diagnostic patterns with clinic experience. Moreover, a human-AI collaborative diagnosis test is conducted and the participants with collaboration achieved a performance exceeding that of both humans and AI. By synergistically integrating interpretability and interaction, this study facilitates the convergence of clinicians' expertise and data-driven intelligence. The promotion of inexperienced ophthalmologists with the aid of AI-based biomarkers, as well as increased AI prediction by intervention from experienced ones, demonstrate a promising diagnostic paradigm for infectious keratitis using KGDM, which holds the potential for extension to other diseases where experienced medical practitioners are limited and the safety of AI is concerned.

Enabling Collaborative Clinical Diagnosis of Infectious Keratitis by Integrating Expert Knowledge and Interpretable Data-driven Intelligence

TL;DR

Abstract

Paper Structure (24 sections, 20 equations, 8 figures, 8 tables)

This paper contains 24 sections, 20 equations, 8 figures, 8 tables.

Introduction
Method
Results
Human-AI collaborative diagnosis performance
Discussion
Code availability
Data availability
Acknowledgements
Author contributions statement
Methods

Figures (8)

Figure 1: (a) Different subclasses of infectious keratitis share similarities (left) but the images from the same subclass may exhibit heterogeneity (right). (b) The overview of knowledge-guided data-driven model (KGDM), where the classification depends on similarities between features of the input image and the prototypical parts. The latter are automatically learned from training data with the penalty of minimizing intra-class variance and inter-class correlation based on expert experience. Each learned prototype is embedded into a vector and can be visualized as patterns in historical cases. Based on the visualized interpretation, Humans can incorporate their diagnosis opinion into the classification process by intervention in the weighted sum of similarities. (c) Detailed description of how to visualize a learned prototype by its embedding vector on a test image. The high similarity area is circled within a yellow contour and the corresponding maximum similarity is used to classify. (d) Detailed description of human-AI complementary diagnosis. Clinicians could take a look at the interpretation of the classification process as a reference and give an AI-aided diagnosis opinion. Combined with the clinician's opinion of the current case patterns, the original reasoning process of KGDM can also be modified to neglect incorrect prototypes.
Figure 2: Inclusion and exclusion criteria of the SRRT. The visiting time, diagnosis, medical records, and slit lamp microscopic images of each case were rechecked by ophthalmologists via a standard process to include and exclude the images in the SRRT. We strictly selected 3223 high-quality images from 35361 images to train the case-based interpretation model.
Figure 3: Comparison of KGDM and purely data-driven models. The receiver operating characteristic (ROC) curves for diagnosing AK, BK, FK, and HSK on (a) SRRPV and (b) XS. KGDM performed better on FK and HSK. (c) and (d) show the t-SNE plot to visualize the feature distribution of ResNet50 and KGDM respectively when embedding the same data, intuitively showing that prior knowledge penalized feature distribution is more separable.
Figure 4: Visualized reasoning process and interpretation of learned prototypes.a. A example of the visualized reasoning process showing an HSK patient who was classified through similarities to HSK-related manifestation prototypes. The prototypes are illustrated with retrieved training samples. The high similarity area is circled with yellow curves. b. Quantitative evaluation of prototype. Logarithmic scale of diagnostic odds ratios between 40 prototypes and 9 diseases in SRRSH prospective validation dataset, measuring the correlation between the learned diagnostic pattern and the diseases (values are set to 0 if their 95% confidence interval spanned over 0). c. Qualitative evaluation of learned prototypes. Representative regions of learned prototypes are shown inside the yellow contour, which contained valid and stable signs benefiting the diagnosis of target diseases. The corresponding diagnostic odd ratio (DOR) is calculated on the prospective validation dataset.
Figure 5: The performance of the Human-XAI collaboration test(a) the Human-AI collaboration diagnosing workflow based on sample-level interpretation and interactive interface. (b) comparing the diagnosis performance improvement between the two groups, "Junior" and "Senior", before and after collaboration with KGDM. (c) the ROC curves comparing the effects of each ophthalmologist before and after collaboration with KGDM. The orange dots represent the performance of each tester in the first step (Human), the green triangles represent its performance using KGDM, and the black dots and triangles represent the mean value of the two groups, respectively.
...and 3 more figures

Enabling Collaborative Clinical Diagnosis of Infectious Keratitis by Integrating Expert Knowledge and Interpretable Data-driven Intelligence

TL;DR

Abstract

Enabling Collaborative Clinical Diagnosis of Infectious Keratitis by Integrating Expert Knowledge and Interpretable Data-driven Intelligence

Authors

TL;DR

Abstract

Table of Contents

Figures (8)