Human-AI Collaboration and Explainability for 2D/3D Registration Quality Assurance
Sue Min Cho, Alexander Do, Russell H. Taylor, Mathias Unberath
TL;DR
This paper tackles quality assurance for 2D/3D registration in image-guided surgery by introducing a learning-based QA model augmented with Grad-CAM explainability and conformal prediction. It systematically compares AI-only, Human-only, and two human-AI collaboration modes (with and without explanations), finding that collaboration improves sensitivity, precision, and specificity while reducing workload, and that explanations boost user understanding. The work demonstrates that explainable AI can meaningfully support human oversight in safety-critical tasks, though gains from XAI over non-explainable collaboration are modest and depend on design. It lays a path for robust, human-centered QA in surgical navigation and suggests avenues for richer interaction, iterative explanations, and adaptive clinical thresholds. Overall, the study highlights how integrating interpretable AI with human judgment can enhance the reliability and efficiency of 2D/3D registration quality assurance in practice.
Abstract
Purpose: As surgery increasingly integrates advanced imaging, algorithms, and robotics to automate complex tasks, human judgment of system correctness remains a vital safeguard for patient safety. A critical example is 2D/3D registration, where small registration misalignments can lead to surgical errors. Current visualization strategies alone are insufficient to reliably enable humans to detect these misalignments, highlighting the need for enhanced decision-support tools. Methods: We propose the first artificial intelligence (AI) model tailored to 2D/3D registration quality assessment, augmented with explainable AI (XAI) mechanisms to clarify the model's predictions. Using both objective measures (e.g., accuracy, sensitivity, precision, specificity) and subjective evaluations (e.g., workload, trust, and understanding), we systematically compare decision-making across four conditions: AI-only, Human-only, Human+AI, and Human+XAI. Results: The AI-only condition achieved the highest accuracy, whereas collaborative paradigms (Human+AI and Human+XAI) improved sensitivity, precision, and specificity compared to standalone approaches. Participants experienced significantly lower workload in collaborative conditions relative to the Human-only condition. Moreover, participants reported a greater understanding of AI predictions in the Human+XAI condition than in Human+AI, although no significant differences were observed between the two collaborative paradigms in perceived trust or workload. Conclusion: Human-AI collaboration can enhance 2D/3D registration quality assurance, with explainability mechanisms improving user understanding. Future work should refine XAI designs to optimize decision-making performance and efficiency. Extending both the algorithmic design and human-XAI collaboration elements holds promise for more robust quality assurance of 2D/3D registration.
