a2z-1 for Multi-Disease Detection in Abdomen-Pelvis CT: External Validation and Performance Analysis Across 21 Conditions

Pranav Rajpurkar; Julian N. Acosta; Siddhant Dogra; Jaehwan Jeong; Deepanshu Jindal; Michael Moritz; Samir Rajpurkar

a2z-1 for Multi-Disease Detection in Abdomen-Pelvis CT: External Validation and Performance Analysis Across 21 Conditions

Pranav Rajpurkar, Julian N. Acosta, Siddhant Dogra, Jaehwan Jeong, Deepanshu Jindal, Michael Moritz, Samir Rajpurkar

TL;DR

The paper introduces a2z-1, an AI model for multi-disease detection on abdomen-pelvis CT across 21 conditions, and demonstrates strong discrimination with an external mean AUC of 0.923 and notable performance for time-sensitive conditions like SBO and acute pancreatitis. It emphasizes external validation across three health systems, robust generalizability to varied imaging protocols and patient demographics, and a confidence-based workflow framework to balance precision and throughput. The work also analyzes mispredictions and labeling issues to distinguish model limitations from data quality, and argues for broader workflow integration and QA用途. Overall, the study advances clinically relevant AI for CT interpretation by combining broad disease coverage, rigorous external validation, and practical deployment considerations that could streamline radiology workflows and support quality improvement.

Abstract

We present a comprehensive evaluation of a2z-1, an artificial intelligence (AI) model designed to analyze abdomen-pelvis CT scans for 21 time-sensitive and actionable findings. Our study focuses on rigorous assessment of the model's performance and generalizability. Large-scale retrospective analysis demonstrates an average AUC of 0.931 across 21 conditions. External validation across two distinct health systems confirms consistent performance (AUC 0.923), establishing generalizability to different evaluation scenarios, with notable performance in critical findings such as small bowel obstruction (AUC 0.958) and acute pancreatitis (AUC 0.961). Subgroup analysis shows consistent accuracy across patient sex, age groups, and varied imaging protocols, including different slice thicknesses and contrast administration types. Comparison of high-confidence model outputs to radiologist reports reveals instances where a2z-1 identified overlooked findings, suggesting potential for quality assurance applications.

a2z-1 for Multi-Disease Detection in Abdomen-Pelvis CT: External Validation and Performance Analysis Across 21 Conditions

TL;DR

Abstract

a2z-1 for Multi-Disease Detection in Abdomen-Pelvis CT: External Validation and Performance Analysis Across 21 Conditions

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)