Table of Contents
Fetching ...

Rethinking Whole-Body CT Image Interpretation: An Abnormality-Centric Approach

Ziheng Zhao, Lisong Dai, Ya Zhang, Yanfeng Wang, Weidi Xie

TL;DR

The paper advances abnormality-centric CT interpretation by introducing a 404-category taxonomy and OmniAbnorm-CT-14K, a large-scale multi-plane whole-body CT dataset with detailed grounding annotations. It then proposes OmniAbnorm-CT, a vision-language-grounded system that grounding abnormalities and generates clinically oriented descriptions under text prompts or visual cues. The authors define three practical tasks and a clinically grounded AbnormRubric metric, showing substantial improvements over baselines in both internal and external validations. Together, these contributions enable more explainable, comprehensive, and actionable CT interpretation across the entire body, with potential to transform radiology workflows.

Abstract

Automated interpretation of CT images-particularly localizing and describing abnormal findings across multi-plane and whole-body scans-remains a significant challenge in clinical radiology. This work aims to address this challenge through four key contributions: (i) On taxonomy, we collaborate with senior radiologists to propose a comprehensive hierarchical classification system, with 404 representative abnormal findings across all body regions; (ii) On data, we contribute a dataset containing over 14.5K CT images from multiple planes and all human body regions, and meticulously provide grounding annotations for over 19K abnormalities, each linked to the detailed description and cast into the taxonomy; (iii) On model development, we propose OmniAbnorm-CT, which can automatically ground and describe abnormal findings on multi-plane and whole-body CT images based on text queries, while also allowing flexible interaction through visual prompts; (iv) On evaluation, we establish three representative tasks based on real clinical scenarios, and introduce a clinically grounded metric to assess abnormality descriptions. Through extensive experiments, we show that OmniAbnorm-CT can significantly outperform existing methods in both internal and external validations, and across all the tasks.

Rethinking Whole-Body CT Image Interpretation: An Abnormality-Centric Approach

TL;DR

The paper advances abnormality-centric CT interpretation by introducing a 404-category taxonomy and OmniAbnorm-CT-14K, a large-scale multi-plane whole-body CT dataset with detailed grounding annotations. It then proposes OmniAbnorm-CT, a vision-language-grounded system that grounding abnormalities and generates clinically oriented descriptions under text prompts or visual cues. The authors define three practical tasks and a clinically grounded AbnormRubric metric, showing substantial improvements over baselines in both internal and external validations. Together, these contributions enable more explainable, comprehensive, and actionable CT interpretation across the entire body, with potential to transform radiology workflows.

Abstract

Automated interpretation of CT images-particularly localizing and describing abnormal findings across multi-plane and whole-body scans-remains a significant challenge in clinical radiology. This work aims to address this challenge through four key contributions: (i) On taxonomy, we collaborate with senior radiologists to propose a comprehensive hierarchical classification system, with 404 representative abnormal findings across all body regions; (ii) On data, we contribute a dataset containing over 14.5K CT images from multiple planes and all human body regions, and meticulously provide grounding annotations for over 19K abnormalities, each linked to the detailed description and cast into the taxonomy; (iii) On model development, we propose OmniAbnorm-CT, which can automatically ground and describe abnormal findings on multi-plane and whole-body CT images based on text queries, while also allowing flexible interaction through visual prompts; (iv) On evaluation, we establish three representative tasks based on real clinical scenarios, and introduce a clinically grounded metric to assess abnormality descriptions. Through extensive experiments, we show that OmniAbnorm-CT can significantly outperform existing methods in both internal and external validations, and across all the tasks.

Paper Structure

This paper contains 34 sections, 10 equations, 7 figures, 22 tables.

Figures (7)

  • Figure 1: This work introduces OmniAbnorm-CT-14K, the first large-scale dataset for grounding and describing abnormal findings in CT images. Left: OmniAbnorm-CT-14K contains 14.5K multi-plane, whole-body CT images, covering 349 representative abnormal findings across 82 anatomical structures and 40 major systems or organs. Right: Distribution of the dataset across anatomical structures and major systems or organs, with darker blue indicating higher sample density.
  • Figure 2: Data curation overview. (a) The image-report pairs are collected from an open-sourced and expert-checked website; (b) and (c) Radiologists provide grounding annotation on any abnormalities, link to their text description in reports, and categorize into the taxonomy devised by senior radiologists. The annotation is further extended to instruction data with simulated visual prompts and text queries.
  • Figure 3: OmniAbnorm-CT. We bridge a VLM and a segmentation module, to allow grounding evidence acquisition during the generation of abnormality description, and further enhance its comprehension for flexible usage with text instruction and visual prompts.
  • Figure 4: Qualitative comparison on the grounded report generation task.
  • Figure 5: Qualitative comparison on the text-guided grounded report generation task.
  • ...and 2 more figures