Table of Contents
Fetching ...

Explainable AI Enhances Glaucoma Referrals, Yet the Human-AI Team Still Falls Short of the AI Alone

Catalina Gomez, Ruolin Wang, Katharina Breininger, Corinne Casey, Chris Bradley, Mitchell Pavlak, Alex Pham, Jithin Yohannan, Mathias Unberath

TL;DR

This work addresses how explainable AI can assist primary eye care providers in triaging glaucoma referrals to specialists. It develops both a black-box DLM and an intrinsically interpretable WoE-based scorecard, paired with an optometrist-focused human-AI study using four interaction conditions to map AI outputs to urgent, near-term, or no referrals. Key findings show AI support improves referral accuracy over unaided decisions, yet AI-alone performance remains higher than human-AI teams, with scoring-based explanations offering the strongest signal for perceived usefulness and deployment willingness. The results highlight the potential and current limitations of human-AI collaboration in glaucoma care, underscoring the need for user-centered design and further research to safely integrate AI into primary care workflows.

Abstract

Primary care providers are vital for initial triage and referrals to specialty care. In glaucoma, asymptomatic and fast progression can lead to vision loss, necessitating timely referrals to specialists. However, primary eye care providers may not identify urgent cases, potentially delaying care. Artificial Intelligence (AI) offering explanations could enhance their referral decisions. We investigate how various AI explanations help providers distinguish between patients needing immediate or non-urgent specialist referrals. We built explainable AI algorithms to predict glaucoma surgery needs from routine eyecare data as a proxy for identifying high-risk patients. We incorporated intrinsic and post-hoc explainability and conducted an online study with optometrists to assess human-AI team performance, measuring referral accuracy and analyzing interactions with AI, including agreement rates, task time, and user experience perceptions. AI support enhanced referral accuracy among 87 participants (59.9%/50.8% with/without AI), though Human-AI teams underperformed compared to AI alone. Participants believed they included AI advice more when using the intrinsic model, and perceived it more useful and promising. Without explanations, deviations from AI recommendations increased. AI support did not increase workload, confidence, and trust, but reduced challenges. On a separate test set, our black-box and intrinsic models achieved an accuracy of 77% and 71%, respectively, in predicting surgical outcomes. We identify opportunities of human-AI teaming for glaucoma management in primary eye care, noting that while AI enhances referral accuracy, it also shows a performance gap compared to AI alone, even with explanations. Human involvement remains essential in medical decision making, underscoring the need for future research to optimize collaboration, ensuring positive experiences and safe AI use.

Explainable AI Enhances Glaucoma Referrals, Yet the Human-AI Team Still Falls Short of the AI Alone

TL;DR

This work addresses how explainable AI can assist primary eye care providers in triaging glaucoma referrals to specialists. It develops both a black-box DLM and an intrinsically interpretable WoE-based scorecard, paired with an optometrist-focused human-AI study using four interaction conditions to map AI outputs to urgent, near-term, or no referrals. Key findings show AI support improves referral accuracy over unaided decisions, yet AI-alone performance remains higher than human-AI teams, with scoring-based explanations offering the strongest signal for perceived usefulness and deployment willingness. The results highlight the potential and current limitations of human-AI collaboration in glaucoma care, underscoring the need for user-centered design and further research to safely integrate AI into primary care workflows.

Abstract

Primary care providers are vital for initial triage and referrals to specialty care. In glaucoma, asymptomatic and fast progression can lead to vision loss, necessitating timely referrals to specialists. However, primary eye care providers may not identify urgent cases, potentially delaying care. Artificial Intelligence (AI) offering explanations could enhance their referral decisions. We investigate how various AI explanations help providers distinguish between patients needing immediate or non-urgent specialist referrals. We built explainable AI algorithms to predict glaucoma surgery needs from routine eyecare data as a proxy for identifying high-risk patients. We incorporated intrinsic and post-hoc explainability and conducted an online study with optometrists to assess human-AI team performance, measuring referral accuracy and analyzing interactions with AI, including agreement rates, task time, and user experience perceptions. AI support enhanced referral accuracy among 87 participants (59.9%/50.8% with/without AI), though Human-AI teams underperformed compared to AI alone. Participants believed they included AI advice more when using the intrinsic model, and perceived it more useful and promising. Without explanations, deviations from AI recommendations increased. AI support did not increase workload, confidence, and trust, but reduced challenges. On a separate test set, our black-box and intrinsic models achieved an accuracy of 77% and 71%, respectively, in predicting surgical outcomes. We identify opportunities of human-AI teaming for glaucoma management in primary eye care, noting that while AI enhances referral accuracy, it also shows a performance gap compared to AI alone, even with explanations. Human involvement remains essential in medical decision making, underscoring the need for future research to optimize collaboration, ensuring positive experiences and safe AI use.
Paper Structure (26 sections, 5 figures, 3 tables)

This paper contains 26 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Visualization of the web interface for referral of patients to specialists. Participants first review the VF, OCT, and clinical data on the left side. The Human Only or AI recommendation (No Explanation/Feature Importance Explanations/Scoring-based Explanations) is displayed on the top right side. If present, explanations are displayed below the box that displays the AI recommendation (not shown in this image). Participants then decide if the patient needs a referral to glaucoma specialist within 3 months, 3-12 months, or does not need a referral currently.
  • Figure 2: Example of Feature Importance Explanations. The AI recommendation is accompanied by the top three most important features calculated through SHAP values. The patient is recommended to schedule an urgent referral to a glaucoma specialist (within 3 months). High PSD, low average RNFL thickness, and low MD are identified as the top three most crucial features influencing the AI recommendation.
  • Figure 3: Example of Scoring-based Explanations. The AI recommendation is displayed with risk scores calculated using the Credit Scorecard method. The referral score is presented within a range that is associated with a referral recommendation, as shown on the left. Detailed rules for score calculation based on the feature distribution are provided on the right. The glaucoma risk scoring system for patient referral recommendation assigns a score to each risk factor ultimately integrated into the predictive model. Point scores calculation: The base score is set at 470, and the final score of 342 (derived from 470-9-55-19-28-17=342) is determined by the mathematical formula and corresponds to an urgent referral within 3 months.
  • Figure 4: Box plots for objective measures across different explanation conditions and baselines. Agreement and AI deviation scores only include groups with AI support.
  • Figure 5: Distribution of five-point rating scale responses in for the following constructs: A) Confidence, B) Workload - Effort, C) Workload - Frustration, D) Trust, E) Support decision making, F) Helpfulness, and G) Future use. For each measure, explanation groups (No Explanation, Feature Importance, Scoring-based) and baseline (on the top if present) are shown.