Table of Contents
Fetching ...

a2z-1 for Multi-Disease Detection in Abdomen-Pelvis CT: External Validation and Performance Analysis Across 21 Conditions

Pranav Rajpurkar, Julian N. Acosta, Siddhant Dogra, Jaehwan Jeong, Deepanshu Jindal, Michael Moritz, Samir Rajpurkar

TL;DR

The paper introduces a2z-1, an AI model for multi-disease detection on abdomen-pelvis CT across 21 conditions, and demonstrates strong discrimination with an external mean AUC of 0.923 and notable performance for time-sensitive conditions like SBO and acute pancreatitis. It emphasizes external validation across three health systems, robust generalizability to varied imaging protocols and patient demographics, and a confidence-based workflow framework to balance precision and throughput. The work also analyzes mispredictions and labeling issues to distinguish model limitations from data quality, and argues for broader workflow integration and QA用途. Overall, the study advances clinically relevant AI for CT interpretation by combining broad disease coverage, rigorous external validation, and practical deployment considerations that could streamline radiology workflows and support quality improvement.

Abstract

We present a comprehensive evaluation of a2z-1, an artificial intelligence (AI) model designed to analyze abdomen-pelvis CT scans for 21 time-sensitive and actionable findings. Our study focuses on rigorous assessment of the model's performance and generalizability. Large-scale retrospective analysis demonstrates an average AUC of 0.931 across 21 conditions. External validation across two distinct health systems confirms consistent performance (AUC 0.923), establishing generalizability to different evaluation scenarios, with notable performance in critical findings such as small bowel obstruction (AUC 0.958) and acute pancreatitis (AUC 0.961). Subgroup analysis shows consistent accuracy across patient sex, age groups, and varied imaging protocols, including different slice thicknesses and contrast administration types. Comparison of high-confidence model outputs to radiologist reports reveals instances where a2z-1 identified overlooked findings, suggesting potential for quality assurance applications.

a2z-1 for Multi-Disease Detection in Abdomen-Pelvis CT: External Validation and Performance Analysis Across 21 Conditions

TL;DR

The paper introduces a2z-1, an AI model for multi-disease detection on abdomen-pelvis CT across 21 conditions, and demonstrates strong discrimination with an external mean AUC of 0.923 and notable performance for time-sensitive conditions like SBO and acute pancreatitis. It emphasizes external validation across three health systems, robust generalizability to varied imaging protocols and patient demographics, and a confidence-based workflow framework to balance precision and throughput. The work also analyzes mispredictions and labeling issues to distinguish model limitations from data quality, and argues for broader workflow integration and QA用途. Overall, the study advances clinically relevant AI for CT interpretation by combining broad disease coverage, rigorous external validation, and practical deployment considerations that could streamline radiology workflows and support quality improvement.

Abstract

We present a comprehensive evaluation of a2z-1, an artificial intelligence (AI) model designed to analyze abdomen-pelvis CT scans for 21 time-sensitive and actionable findings. Our study focuses on rigorous assessment of the model's performance and generalizability. Large-scale retrospective analysis demonstrates an average AUC of 0.931 across 21 conditions. External validation across two distinct health systems confirms consistent performance (AUC 0.923), establishing generalizability to different evaluation scenarios, with notable performance in critical findings such as small bowel obstruction (AUC 0.958) and acute pancreatitis (AUC 0.961). Subgroup analysis shows consistent accuracy across patient sex, age groups, and varied imaging protocols, including different slice thicknesses and contrast administration types. Comparison of high-confidence model outputs to radiologist reports reveals instances where a2z-1 identified overlooked findings, suggesting potential for quality assurance applications.

Paper Structure

This paper contains 36 sections, 10 figures, 1 table.

Figures (10)

  • Figure 1: a2z-1 detection of subtle pancreatitis overlooked in initial report.
  • Figure 2: a2z-1 model performance across 21 abdominal conditions. Each segment represents the model's AUC score for detecting a specific condition in abdominen-pelvis CT scans, demonstrating high and consistent performance across both internal and external validation datasets.
  • Figure 3: a2z-1 detects free air finding missed in initial read.
  • Figure 4: a2z-1 enhances diagnostic confidence in acute cholecystitis case.
  • Figure 5: a2z-1 model performance across internal and external validation sets. The model shows consistent AUCs across sites, particularly for small bowel obstruction and appendicitis. Improved performance for liver cirrhosis and retroperitoneal hemorrhage in external sites suggests differences in patient populations. While slight variability is seen in coronary artery calcification and aortic dissection, overall performance remains strong across clinical environments.
  • ...and 5 more figures