Table of Contents
Fetching ...

Improving Model's Interpretability and Reliability using Biomarkers

Gautam Rajendrakumar Gare, Tom Fox, Beam Chansangavej, Amita Krishnan, Ricardo Luis Rodriguez, Bennett P deBoisblanc, Deva Kannan Ramanan, John Michael Galeotti

TL;DR

This work addresses the need for interpretable AI in safety-critical medicine by proposing a biomarker-based lung ultrasound diagnostic pipeline that enforces an interpretable biomarker bottleneck before downstream decision rules. A decision-tree classifier on clinically established biomarkers is compared with Grad-CAM saliency maps to explain predictions, with a user study involving three LUS clinicians. Results indicate that decision-tree explanations help detect false positives, while saliency maps more readily assist with true positives; combining both explanations yields the most consistent clinician judgments. The findings support deploying biomarker-driven interpretability tools to enhance the reliability and safety of lung ultrasound AI in clinical practice.

Abstract

Accurate and interpretable diagnostic models are crucial in the safety-critical field of medicine. We investigate the interpretability of our proposed biomarker-based lung ultrasound diagnostic pipeline to enhance clinicians' diagnostic capabilities. The objective of this study is to assess whether explanations from a decision tree classifier, utilizing biomarkers, can improve users' ability to identify inaccurate model predictions compared to conventional saliency maps. Our findings demonstrate that decision tree explanations, based on clinically established biomarkers, can assist clinicians in detecting false positives, thus improving the reliability of diagnostic models in medicine.

Improving Model's Interpretability and Reliability using Biomarkers

TL;DR

This work addresses the need for interpretable AI in safety-critical medicine by proposing a biomarker-based lung ultrasound diagnostic pipeline that enforces an interpretable biomarker bottleneck before downstream decision rules. A decision-tree classifier on clinically established biomarkers is compared with Grad-CAM saliency maps to explain predictions, with a user study involving three LUS clinicians. Results indicate that decision-tree explanations help detect false positives, while saliency maps more readily assist with true positives; combining both explanations yields the most consistent clinician judgments. The findings support deploying biomarker-driven interpretability tools to enhance the reliability and safety of lung ultrasound AI in clinical practice.

Abstract

Accurate and interpretable diagnostic models are crucial in the safety-critical field of medicine. We investigate the interpretability of our proposed biomarker-based lung ultrasound diagnostic pipeline to enhance clinicians' diagnostic capabilities. The objective of this study is to assess whether explanations from a decision tree classifier, utilizing biomarkers, can improve users' ability to identify inaccurate model predictions compared to conventional saliency maps. Our findings demonstrate that decision tree explanations, based on clinically established biomarkers, can assist clinicians in detecting false positives, thus improving the reliability of diagnostic models in medicine.
Paper Structure (4 sections, 4 figures)

This paper contains 4 sections, 4 figures.

Figures (4)

  • Figure 1: In contrast to conventional approach that learn task-specific feature, we proposed in Gare2022LearningTasks the decoupling of feature extraction from end-tasks by enforcing models to go through an interpretable feature bottleneck of clinically established biomarkers. With help of clinicians we defined 38 lung ultrasound biomarkers that match/exceed DNN performance especially at low data regime. As the feature encoder extracts biomarkers well-known to ultrasound radiologists, clinician can now easily verify the black-box feature extractor output. Also, the downstream classifiers (such as decision trees) that operate on these biomarkers are diagnosable as they operate on known features. Thus giving rise to a fully interpretable diagnostic model.
  • Figure 2: Diagnoising the model's mis-prediction of lung-severity score as 0 instead of 1 for the above ultrasound clip, by examining the decision-tree explanations, we can easily discern that the model overlooked the presence of B-lines, leading to the misprediction. In contrast, relying solely on the saliency map makes it challenging to identify the cause of the misprediction, as the model's attention is distributed across various regions in the frames, including the B-line. Consequently, making it difficult to quantify the contribution of the B-line to the final prediction.
  • Figure 3: Box plot showing the clinician's detection rates of the correctness of AI's output. We see that decision trees are effective at helping users detect false positive predictions. Overall we see that it's beneficial to have a model analysis tool to improve over baseline accuracy of 60%, with saliency maps being the most beneficial.
  • Figure 4: With our decision-tree explanations we can easily interpret the video. Which would be helpful for novice clinicians, who in this case could mistake the vertically stacked A-line bands for B-lines, whereas our decision tree explanation clearly shows no B-lines are present.