Table of Contents
Fetching ...

Enhancing Diagnostic Reliability of Foundation Model with Uncertainty Estimation in OCT Images

Yuanyuan Peng, Aidi Lin, Meng Wang, Tian Lin, Ke Zou, Yinglin Cheng, Tingkun Shi, Xulong Liao, Lixia Feng, Zhen Liang, Xinjian Chen, Huazhu Fu, Haoyu Chen

TL;DR

This work addresses the lack of confidence estimation in retinal OCT AI by introducing FMUE, a fine-tuned foundation model with uncertainty estimation that detects 11 diseases and provides an uncertainty score to flag unreliable predictions. The method adapts the RETFound backbone with LoRA and an uncertainty-based classifier, enabling open-set anomaly detection via thresholding and delivering Grad-CAM explanations. FMUE outperforms RETFound, UIOS, and ophthalmologists on internal/external datasets and demonstrates reliable OOD detection, with uncertainty scores correlating with misclassification and an actionable decision gate for manual review. This approach promises safer deployment of automated retinal anomaly detection in real-world clinics, especially where device- and disease-domain shifts occur.

Abstract

Inability to express the confidence level and detect unseen classes has limited the clinical implementation of artificial intelligence in the real-world. We developed a foundation model with uncertainty estimation (FMUE) to detect 11 retinal conditions on optical coherence tomography (OCT). In the internal test set, FMUE achieved a higher F1 score of 96.76% than two state-of-the-art algorithms, RETFound and UIOS, and got further improvement with thresholding strategy to 98.44%. In the external test sets obtained from other OCT devices, FMUE achieved an accuracy of 88.75% and 92.73% before and after thresholding. Our model is superior to two ophthalmologists with a higher F1 score (95.17% vs. 61.93% &71.72%). Besides, our model correctly predicts high uncertainty scores for samples with ambiguous features, of non-target-category diseases, or with low-quality to prompt manual checks and prevent misdiagnosis. FMUE provides a trustworthy method for automatic retinal anomalies detection in the real-world clinical open set environment.

Enhancing Diagnostic Reliability of Foundation Model with Uncertainty Estimation in OCT Images

TL;DR

This work addresses the lack of confidence estimation in retinal OCT AI by introducing FMUE, a fine-tuned foundation model with uncertainty estimation that detects 11 diseases and provides an uncertainty score to flag unreliable predictions. The method adapts the RETFound backbone with LoRA and an uncertainty-based classifier, enabling open-set anomaly detection via thresholding and delivering Grad-CAM explanations. FMUE outperforms RETFound, UIOS, and ophthalmologists on internal/external datasets and demonstrates reliable OOD detection, with uncertainty scores correlating with misclassification and an actionable decision gate for manual review. This approach promises safer deployment of automated retinal anomaly detection in real-world clinics, especially where device- and disease-domain shifts occur.

Abstract

Inability to express the confidence level and detect unseen classes has limited the clinical implementation of artificial intelligence in the real-world. We developed a foundation model with uncertainty estimation (FMUE) to detect 11 retinal conditions on optical coherence tomography (OCT). In the internal test set, FMUE achieved a higher F1 score of 96.76% than two state-of-the-art algorithms, RETFound and UIOS, and got further improvement with thresholding strategy to 98.44%. In the external test sets obtained from other OCT devices, FMUE achieved an accuracy of 88.75% and 92.73% before and after thresholding. Our model is superior to two ophthalmologists with a higher F1 score (95.17% vs. 61.93% &71.72%). Besides, our model correctly predicts high uncertainty scores for samples with ambiguous features, of non-target-category diseases, or with low-quality to prompt manual checks and prevent misdiagnosis. FMUE provides a trustworthy method for automatic retinal anomalies detection in the real-world clinical open set environment.

Paper Structure

This paper contains 18 sections, 6 figures.

Figures (6)

  • Figure 1: Schematic diagram of our FMUE for clinal work. Step 1 adapted pretrained RETFound to multiple retinal disease classifications on OCT images by means of supervised fine-tuning on data with explicit label. We freeze the image encoder of RETFound (blue area) and insert additional trainable LoRA layers to RETFound for OCT image feature extraction. In addition, to increase the credibility of AI model prediction results, we developed an uncertainty-based classifier to obtain the final prediction result with corresponding uncertainty score. Step 2 shows the inference process of our FMUE in real clinical environment. When the model is fed with an image with obvious features of retinal disease in the training categories, our FMUE will give a diagnosis result with an uncertainty score below the threshold $\theta$ to indicate the diagnosis result is reliable. Conversely, when the input image contains ambiguous features or is OOD data, our model will give a high uncertainty score above the threshold $\theta$ to indicate the result is unreliable and refer the patient to an ophthalmologist for double-checking.
  • Figure 2: The performance of FMUE and UIOS on different datasets. a. Uncertainty density distribution for different datasets in FMUE and UIOS. Solid lines indicate validation and test datasets for target categories of retinal diseases, while different colored dashed lines indicate different out-of-distribution datasets. $\theta$: threshold theta. b. The accuracy of FMUE and UIOS with different percentages of samples remained after excluding the high uncertainty samples on the internal test set and 5 external test sets. The green and red lines curve represent UIOS and FMUE, respectively. The dots on the curves indicate the coordinators of the threshold.
  • Figure 3: The visualization results of FMUE by Grad-CAM and the detection results of six samples of OCT images with RETFound, UIOS and our FMUE. (a) and (b) are the samples with typical features of target diseases; (c) and (d) are the samples with ambiguous features of target diseases; (d) and (f) are OOD samples that are not included in the training category. Unlike RETFound, UIOS and FMUE provide prediction results and the corresponding uncertainty score to reflect the reliability of the prediction results. $\theta$ threshold theta.
  • Figure :
  • Figure :
  • ...and 1 more figures