Table of Contents
Fetching ...

TiBiX: Leveraging Temporal Information for Bidirectional X-ray and Report Generation

Santosh Sanjeev, Fadillah Adamsyah Maani, Arsen Abzhanov, Vijay Ram Papineni, Ibrahim Almakky, Bartłomiej W. Papież, Mohammad Yaqub

TL;DR

TiBiX tackles the omission of temporal context in chest X-ray to report generation by introducing a temporal bidirectional framework that jointly generates current CXR, current report, and prior CXR. It relies on a transformer with causal attention and a temporal token to fuse three modalities, and introduces MIMIC-T, a longitudinal dataset derived from MIMIC-CXR. It reports state-of-the-art results on report generation and competitive performance on image generation, with ablations confirming the benefit of including prior scans. This work provides a practical baseline for longitudinal, bidirectional CXR-to-report tasks and opens avenues for temporal-aware evaluation and knowledge-augmented radiology AI.

Abstract

With the emergence of vision language models in the medical imaging domain, numerous studies have focused on two dominant research activities: (1) report generation from Chest X-rays (CXR), and (2) synthetic scan generation from text or reports. Despite some research incorporating multi-view CXRs into the generative process, prior patient scans and reports have been generally disregarded. This can inadvertently lead to the leaving out of important medical information, thus affecting generation quality. To address this, we propose TiBiX: Leveraging Temporal information for Bidirectional X-ray and Report Generation. Considering previous scans, our approach facilitates bidirectional generation, primarily addressing two challenging problems: (1) generating the current image from the previous image and current report and (2) generating the current report based on both the previous and current images. Moreover, we extract and release a curated temporal benchmark dataset derived from the MIMIC-CXR dataset, which focuses on temporal data. Our comprehensive experiments and ablation studies explore the merits of incorporating prior CXRs and achieve state-of-the-art (SOTA) results on the report generation task. Furthermore, we attain on-par performance with SOTA image generation efforts, thus serving as a new baseline in longitudinal bidirectional CXR-to-report generation. The code is available at https://github.com/BioMedIA-MBZUAI/TiBiX.

TiBiX: Leveraging Temporal Information for Bidirectional X-ray and Report Generation

TL;DR

TiBiX tackles the omission of temporal context in chest X-ray to report generation by introducing a temporal bidirectional framework that jointly generates current CXR, current report, and prior CXR. It relies on a transformer with causal attention and a temporal token to fuse three modalities, and introduces MIMIC-T, a longitudinal dataset derived from MIMIC-CXR. It reports state-of-the-art results on report generation and competitive performance on image generation, with ablations confirming the benefit of including prior scans. This work provides a practical baseline for longitudinal, bidirectional CXR-to-report tasks and opens avenues for temporal-aware evaluation and knowledge-augmented radiology AI.

Abstract

With the emergence of vision language models in the medical imaging domain, numerous studies have focused on two dominant research activities: (1) report generation from Chest X-rays (CXR), and (2) synthetic scan generation from text or reports. Despite some research incorporating multi-view CXRs into the generative process, prior patient scans and reports have been generally disregarded. This can inadvertently lead to the leaving out of important medical information, thus affecting generation quality. To address this, we propose TiBiX: Leveraging Temporal information for Bidirectional X-ray and Report Generation. Considering previous scans, our approach facilitates bidirectional generation, primarily addressing two challenging problems: (1) generating the current image from the previous image and current report and (2) generating the current report based on both the previous and current images. Moreover, we extract and release a curated temporal benchmark dataset derived from the MIMIC-CXR dataset, which focuses on temporal data. Our comprehensive experiments and ablation studies explore the merits of incorporating prior CXRs and achieve state-of-the-art (SOTA) results on the report generation task. Furthermore, we attain on-par performance with SOTA image generation efforts, thus serving as a new baseline in longitudinal bidirectional CXR-to-report generation. The code is available at https://github.com/BioMedIA-MBZUAI/TiBiX.
Paper Structure (13 sections, 2 equations, 2 figures, 6 tables)

This paper contains 13 sections, 2 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Our temporal bidirectional CXR-to-report generation framework which can deal with three inputs (i.e. current report, current CXR, and previous CXR). We utilize an image tokenizer $B_x$ and text tokenizer $B_r$ to tokenize CXR image(s) and report, respectively. A transformer-based model $\mathcal{M}_\theta$ with causal attention is implemented to handle the bidirectional generation task in the auto-regressive manner. The input sequence of $\mathcal{M}_\theta$ consists of images and text tokens, a temporal token (TT) which encodes the time interval between two consecutive CXR scans, and a learnable cls (class) token. We assign TT as the first input sequence and cls as the last sequence. During training, the order of the current report, current CXR, and previous CXR are shuffled, while the missing modality is placed at the last part of the input sequence during inference.
  • Figure 2: (A) Report Generation Task: The first case describes an example of (1) current report generation (CR$|$CX) (2) current report generation (CR$|$(PX,CX)). (B) CXR Image Generation task: The left side of the report is (CX$|$CR) and the on the right side is (CX$|$(CR,PX)). The GT is present for comparison.