Table of Contents
Fetching ...

Human-AI Collaboration and Explainability for 2D/3D Registration Quality Assurance

Sue Min Cho, Alexander Do, Russell H. Taylor, Mathias Unberath

TL;DR

This paper tackles quality assurance for 2D/3D registration in image-guided surgery by introducing a learning-based QA model augmented with Grad-CAM explainability and conformal prediction. It systematically compares AI-only, Human-only, and two human-AI collaboration modes (with and without explanations), finding that collaboration improves sensitivity, precision, and specificity while reducing workload, and that explanations boost user understanding. The work demonstrates that explainable AI can meaningfully support human oversight in safety-critical tasks, though gains from XAI over non-explainable collaboration are modest and depend on design. It lays a path for robust, human-centered QA in surgical navigation and suggests avenues for richer interaction, iterative explanations, and adaptive clinical thresholds. Overall, the study highlights how integrating interpretable AI with human judgment can enhance the reliability and efficiency of 2D/3D registration quality assurance in practice.

Abstract

Purpose: As surgery increasingly integrates advanced imaging, algorithms, and robotics to automate complex tasks, human judgment of system correctness remains a vital safeguard for patient safety. A critical example is 2D/3D registration, where small registration misalignments can lead to surgical errors. Current visualization strategies alone are insufficient to reliably enable humans to detect these misalignments, highlighting the need for enhanced decision-support tools. Methods: We propose the first artificial intelligence (AI) model tailored to 2D/3D registration quality assessment, augmented with explainable AI (XAI) mechanisms to clarify the model's predictions. Using both objective measures (e.g., accuracy, sensitivity, precision, specificity) and subjective evaluations (e.g., workload, trust, and understanding), we systematically compare decision-making across four conditions: AI-only, Human-only, Human+AI, and Human+XAI. Results: The AI-only condition achieved the highest accuracy, whereas collaborative paradigms (Human+AI and Human+XAI) improved sensitivity, precision, and specificity compared to standalone approaches. Participants experienced significantly lower workload in collaborative conditions relative to the Human-only condition. Moreover, participants reported a greater understanding of AI predictions in the Human+XAI condition than in Human+AI, although no significant differences were observed between the two collaborative paradigms in perceived trust or workload. Conclusion: Human-AI collaboration can enhance 2D/3D registration quality assurance, with explainability mechanisms improving user understanding. Future work should refine XAI designs to optimize decision-making performance and efficiency. Extending both the algorithmic design and human-XAI collaboration elements holds promise for more robust quality assurance of 2D/3D registration.

Human-AI Collaboration and Explainability for 2D/3D Registration Quality Assurance

TL;DR

This paper tackles quality assurance for 2D/3D registration in image-guided surgery by introducing a learning-based QA model augmented with Grad-CAM explainability and conformal prediction. It systematically compares AI-only, Human-only, and two human-AI collaboration modes (with and without explanations), finding that collaboration improves sensitivity, precision, and specificity while reducing workload, and that explanations boost user understanding. The work demonstrates that explainable AI can meaningfully support human oversight in safety-critical tasks, though gains from XAI over non-explainable collaboration are modest and depend on design. It lays a path for robust, human-centered QA in surgical navigation and suggests avenues for richer interaction, iterative explanations, and adaptive clinical thresholds. Overall, the study highlights how integrating interpretable AI with human judgment can enhance the reliability and efficiency of 2D/3D registration quality assurance in practice.

Abstract

Purpose: As surgery increasingly integrates advanced imaging, algorithms, and robotics to automate complex tasks, human judgment of system correctness remains a vital safeguard for patient safety. A critical example is 2D/3D registration, where small registration misalignments can lead to surgical errors. Current visualization strategies alone are insufficient to reliably enable humans to detect these misalignments, highlighting the need for enhanced decision-support tools. Methods: We propose the first artificial intelligence (AI) model tailored to 2D/3D registration quality assessment, augmented with explainable AI (XAI) mechanisms to clarify the model's predictions. Using both objective measures (e.g., accuracy, sensitivity, precision, specificity) and subjective evaluations (e.g., workload, trust, and understanding), we systematically compare decision-making across four conditions: AI-only, Human-only, Human+AI, and Human+XAI. Results: The AI-only condition achieved the highest accuracy, whereas collaborative paradigms (Human+AI and Human+XAI) improved sensitivity, precision, and specificity compared to standalone approaches. Participants experienced significantly lower workload in collaborative conditions relative to the Human-only condition. Moreover, participants reported a greater understanding of AI predictions in the Human+XAI condition than in Human+AI, although no significant differences were observed between the two collaborative paradigms in perceived trust or workload. Conclusion: Human-AI collaboration can enhance 2D/3D registration quality assurance, with explainability mechanisms improving user understanding. Future work should refine XAI designs to optimize decision-making performance and efficiency. Extending both the algorithmic design and human-XAI collaboration elements holds promise for more robust quality assurance of 2D/3D registration.

Paper Structure

This paper contains 19 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: Overview of Proposed Model Architecture (The X-ray, DRR, and Grad-CAM output corresponds to Specimen ID: 18-2800, Projection ID: 0, Sample ID: 24)
  • Figure 2: User interface in the Human–XAI condition. Participants viewed the AI’s classification decision ("Accept" or "Reject") along with confidence scores and Grad-CAM heatmaps overlaid on the X-ray images and registration overlay, highlighting spatial regions that influenced the AI’s judgment.
  • Figure 3: Box plots of NASA-TLX scores across three conditions: (1)Human-only, (2)Human+AI, and (3)Human+XAI. Lower scores indicate lower perceived workload. *$p<0.05$, **$p<0.01$, *** $p<0.001$