Table of Contents
Fetching ...

CT-AGRG: Automated Abnormality-Guided Report Generation from 3D Chest CT Volumes

Theo Di Piazza, Carole Lazarus, Olivier Nempont, Loic Boussel

TL;DR

The paper tackles unguided 3D chest CT report generation by introducing CT-AGRG, a two-stage architecture that first detects 18 abnormalities and then generates per-abnormality sentences using a GPT-2 decoder conditioned on anomaly-specific embeddings. The approach leverages a pre-trained visual encoder and per-label projections to create focused embeddings, with a lightweight MLP translating these into textual descriptions via pseudo self-attention. An extensive CT-RATE evaluation shows significant improvements in clinical-efficacy and natural-language-generation metrics over CT2Rep, corroborated by an ablative analysis that confirms the value of multi-task classification and latent-space augmentation. The method offers improved report completeness and clinical relevance while maintaining feasible training requirements on standard hardware, representing a practical advance for automated radiology reporting.

Abstract

The rapid increase of computed tomography (CT) scans and their time-consuming manual analysis have created an urgent need for robust automated analysis techniques in clinical settings. These aim to assist radiologists and help them managing their growing workload. Existing methods typically generate entire reports directly from 3D CT images, without explicitly focusing on observed abnormalities. This unguided approach often results in repetitive content or incomplete reports, failing to prioritize anomaly-specific descriptions. We propose a new anomaly-guided report generation model, which first predicts abnormalities and then generates targeted descriptions for each. Evaluation on a public dataset demonstrates significant improvements in report quality and clinical relevance. We extend our work by conducting an ablation study to demonstrate its effectiveness.

CT-AGRG: Automated Abnormality-Guided Report Generation from 3D Chest CT Volumes

TL;DR

The paper tackles unguided 3D chest CT report generation by introducing CT-AGRG, a two-stage architecture that first detects 18 abnormalities and then generates per-abnormality sentences using a GPT-2 decoder conditioned on anomaly-specific embeddings. The approach leverages a pre-trained visual encoder and per-label projections to create focused embeddings, with a lightweight MLP translating these into textual descriptions via pseudo self-attention. An extensive CT-RATE evaluation shows significant improvements in clinical-efficacy and natural-language-generation metrics over CT2Rep, corroborated by an ablative analysis that confirms the value of multi-task classification and latent-space augmentation. The method offers improved report completeness and clinical relevance while maintaining feasible training requirements on standard hardware, representing a practical advance for automated radiology reporting.

Abstract

The rapid increase of computed tomography (CT) scans and their time-consuming manual analysis have created an urgent need for robust automated analysis techniques in clinical settings. These aim to assist radiologists and help them managing their growing workload. Existing methods typically generate entire reports directly from 3D CT images, without explicitly focusing on observed abnormalities. This unguided approach often results in repetitive content or incomplete reports, failing to prioritize anomaly-specific descriptions. We propose a new anomaly-guided report generation model, which first predicts abnormalities and then generates targeted descriptions for each. Evaluation on a public dataset demonstrates significant improvements in report quality and clinical relevance. We extend our work by conducting an ablation study to demonstrate its effectiveness.
Paper Structure (16 sections, 8 equations, 3 figures, 1 table)

This paper contains 16 sections, 8 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Overview of the method. Pre-training. The input volume $x$ is passed through a visual extractor $\Phi_{V}$ (either CT-ViT hamamci_generatect_2024 or CT-Net draelos_machine-learning-based_2021) to extract an embedding $h$. $h$ is then given to classification head $\Psi$ which predicts the logit vector $\hat{y}$. Step 1.$h$ is fed into 18 projection heads ($\Psi^{p}_{i}, i \in \{1, \ldots, 18\}$) followed by small classification heads ($\Psi^{c}_{i}, i \in \{1, \ldots, 18\}$), one for each label. This second step enables to obtain an embedding $h_{i}$ (and then $h^{a}_{i}$) specific to each label. Step 2. If a label indexed by $i$ is predicted as abnormal by its corresponding classification head $\Psi^{c}_{i}$, the associated embedding $h^{a}_{i}$ is transformed by a lightweight MLP $\Phi_{T}(e_{i})$ to obtain $e_{i}$. An abnormality-specific description is generated from a pre-trained GPT-2 using $e_{i}$.
  • Figure 2: Comparison of ground truth with reports generated by CT2Rep and CT-AGRG from the CT-RATE test set.
  • Figure 3: Comparison of ground truth with reports generated by CT2Rep and CT-AGRG from the CT-RATE test set.