Table of Contents
Fetching ...

CT2Rep: Automated Radiology Report Generation for 3D Medical Imaging

Ibrahim Ethem Hamamci, Sezgin Er, Bjoern Menze

TL;DR

CT2Rep introduces the first framework for automated radiology report generation from 3D chest CT volumes, addressing the lack of 3D-capable methods and data scarcity. It employs a novel auto-regressive causal transformer as a 3D vision feature extractor, a transformer encoder, and a transformer decoder with relational memory and memory-driven conditional layer normalization to generate reports from volumetric data $x \in \mathbb{R}^{240\times480\times480}$ mapped to a report sequence over vocabulary $\mathbb{V}$. The authors further extend the model with CT2RepLong, a longitudinal multimodal fusion system using cross-attention between prior volumes $x^{old}$ and prior reports $r^{old}$ to improve current descriptions. Evaluations on the CT-RATE dataset show CT2Rep outperforms a 3D vision-encoder baseline (CT-Net) on NLG and clinical-efficacy metrics, while CT2RepLong provides additional gains by leveraging historical information; the work is open-sourced to accelerate 3D radiology reporting research.

Abstract

Medical imaging plays a crucial role in diagnosis, with radiology reports serving as vital documentation. Automating report generation has emerged as a critical need to alleviate the workload of radiologists. While machine learning has facilitated report generation for 2D medical imaging, extending this to 3D has been unexplored due to computational complexity and data scarcity. We introduce the first method to generate radiology reports for 3D medical imaging, specifically targeting chest CT volumes. Given the absence of comparable methods, we establish a baseline using an advanced 3D vision encoder in medical imaging to demonstrate our method's effectiveness, which leverages a novel auto-regressive causal transformer. Furthermore, recognizing the benefits of leveraging information from previous visits, we augment CT2Rep with a cross-attention-based multi-modal fusion module and hierarchical memory, enabling the incorporation of longitudinal multimodal data. Access our code at https://github.com/ibrahimethemhamamci/CT2Rep

CT2Rep: Automated Radiology Report Generation for 3D Medical Imaging

TL;DR

CT2Rep introduces the first framework for automated radiology report generation from 3D chest CT volumes, addressing the lack of 3D-capable methods and data scarcity. It employs a novel auto-regressive causal transformer as a 3D vision feature extractor, a transformer encoder, and a transformer decoder with relational memory and memory-driven conditional layer normalization to generate reports from volumetric data mapped to a report sequence over vocabulary . The authors further extend the model with CT2RepLong, a longitudinal multimodal fusion system using cross-attention between prior volumes and prior reports to improve current descriptions. Evaluations on the CT-RATE dataset show CT2Rep outperforms a 3D vision-encoder baseline (CT-Net) on NLG and clinical-efficacy metrics, while CT2RepLong provides additional gains by leveraging historical information; the work is open-sourced to accelerate 3D radiology reporting research.

Abstract

Medical imaging plays a crucial role in diagnosis, with radiology reports serving as vital documentation. Automating report generation has emerged as a critical need to alleviate the workload of radiologists. While machine learning has facilitated report generation for 2D medical imaging, extending this to 3D has been unexplored due to computational complexity and data scarcity. We introduce the first method to generate radiology reports for 3D medical imaging, specifically targeting chest CT volumes. Given the absence of comparable methods, we establish a baseline using an advanced 3D vision encoder in medical imaging to demonstrate our method's effectiveness, which leverages a novel auto-regressive causal transformer. Furthermore, recognizing the benefits of leveraging information from previous visits, we augment CT2Rep with a cross-attention-based multi-modal fusion module and hierarchical memory, enabling the incorporation of longitudinal multimodal data. Access our code at https://github.com/ibrahimethemhamamci/CT2Rep
Paper Structure (14 sections, 5 equations, 4 figures, 3 tables)

This paper contains 14 sections, 5 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: CT2Rep features a novel auto-regressive causal transformer for 3D vision feature extraction, complemented by RM and MCLN-enhanced transformer-based encoder and decoder network for clinically accurate report generation.
  • Figure 2: CT2RepLong enhances CT2Rep with a cross-attention multi-modal fusion module and longitudinal memory for effective historical data integration.
  • Figure 3: Comparison of ground-truth with reports generated by a CT-Net-based baseline and CT2Rep, highlighting CT2Rep's medical precision with color codes.
  • Figure 4: CT2RepLong surpasses the baseline, leveraging longitudinal data for enhanced medical detail accuracy, with related terms color-coded for clarity.