2-Factor Retrieval for Improved Human-AI Decision Making in Radiology
Jim Solomon, Laleh Jalilian, Alexander Vilesov, Meryl Mathew, Tristan Grogan, Arash Bedayat, Achuta Kadambi
TL;DR
The paper addresses how to design AI decision support that clinicians can verify, addressing the trust and transparency issues of opaque models. It introduces 2-factor retrieval (2FR), which combines an interface that presents AI diagnoses with retrieval of similarly labeled canonical images to support verification. In a chest X-ray study with 69 clinicians, 2FR yielded the highest overall accuracy (~70%) when AI predictions were correct and was especially beneficial for radiologists and low-confidence clinicians. When AI predictions were incorrect, all modalities regressed toward No AI performance, highlighting the need for robust verification-based designs, and the authors discuss extending 2FR to other domains and future work.
Abstract
Human-machine teaming in medical AI requires us to understand to what degree a trained clinician should weigh AI predictions. While previous work has shown the potential of AI assistance at improving clinical predictions, existing clinical decision support systems either provide no explainability of their predictions or use techniques like saliency and Shapley values, which do not allow for physician-based verification. To address this gap, this study compares previously used explainable AI techniques with a newly proposed technique termed '2-factor retrieval (2FR)', which is a combination of interface design and search retrieval that returns similarly labeled data without processing this data. This results in a 2-factor security blanket where: (a) correct images need to be retrieved by the AI; and (b) humans should associate the retrieved images with the current pathology under test. We find that when tested on chest X-ray diagnoses, 2FR leads to increases in clinician accuracy, with particular improvements when clinicians are radiologists and have low confidence in their decision. Our results highlight the importance of understanding how different modes of human-AI decision making may impact clinician accuracy in clinical decision support systems.
