BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval

Yinda Chen; Che Liu; Xiaoyu Liu; Rossella Arcucci; Zhiwei Xiong

BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval

Yinda Chen, Che Liu, Xiaoyu Liu, Rossella Arcucci, Zhiwei Xiong

TL;DR

The paper tackles the lack of robust benchmarks for 3D medical text-image retrieval by introducing BIMCV-R, a public dataset of 8,069 3D CT volumes with radiology reports totaling over 2 million slices and extensive expert annotations. It presents MedFinder, a dual-stream retrieval framework that leverages BiomedCLIP-based language representations, text sampling, view-consistency, and cross-attention fusion to align 3D CT imagery with clinical narratives and enable keyword-based search, optimized with a joint objective $L_{total} = L_{mse} + \alpha L_{sim}$. The authors demonstrate superior performance over baselines in multimodal retrieval and show practical utility for keyword-based retrieval, highlighting the potential of large language models to enhance 3D medical image retrieval. This work establishes BIMCV-R as a foundational benchmark and paves the way for scalable, clinician-friendly, text-guided retrieval of complex 3D medical imaging data, with immediate relevance to diagnostic support and case-based reference.

Abstract

The burgeoning integration of 3D medical imaging into healthcare has led to a substantial increase in the workload of medical professionals. To assist clinicians in their diagnostic processes and alleviate their workload, the development of a robust system for retrieving similar case studies presents a viable solution. While the concept holds great promise, the field of 3D medical text-image retrieval is currently limited by the absence of robust evaluation benchmarks and curated datasets. To remedy this, our study presents a groundbreaking dataset, {BIMCV-R}, which includes an extensive collection of 8,069 3D CT volumes, encompassing over 2 million slices, paired with their respective radiological reports. Expanding upon the foundational work of our dataset, we craft a retrieval strategy, MedFinder. This approach employs a dual-stream network architecture, harnessing the potential of large language models to advance the field of medical image retrieval beyond existing text-image retrieval solutions. It marks our preliminary step towards developing a system capable of facilitating text-to-image, image-to-text, and keyword-based retrieval tasks. Our project is available at \url{https://huggingface.co/datasets/cyd0806/BIMCV-R}.

BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval

TL;DR

. The authors demonstrate superior performance over baselines in multimodal retrieval and show practical utility for keyword-based retrieval, highlighting the potential of large language models to enhance 3D medical image retrieval. This work establishes BIMCV-R as a foundational benchmark and paves the way for scalable, clinician-friendly, text-guided retrieval of complex 3D medical imaging data, with immediate relevance to diagnostic support and case-based reference.

Abstract

Paper Structure (14 sections, 7 equations, 5 figures, 4 tables)

This paper contains 14 sections, 7 equations, 5 figures, 4 tables.

Introduction
Dataset
Data Acquisition and Processing.
Data Statistics Analysis.
Methodology
Overview.
Textual Feature Extracting.
Visual Feature Extracting.
Similarity Matching.
Experiments and Results
Data Splitting and Metrics.
Results.
Ablation Study.
Conclusion

Figures (5)

Figure 1: Construction of the BIMCV-R dataset. Utilizing the BIMCV dataset, we enhanced image quality through selective filtering, advanced denoising, and size standardization. For textual data, we translated radiological reports into English and refined them with GPT-4, ensuring consistency. Expert reviews and diagnoses further ensured data reliability and accuracy.
Figure 1: Summary of Image and Report Statistics.
Figure 2: Sample data of BIMCV-R.
Figure 3: Left: Word Frequency Analysis. Right: World Cloud Analysis.
Figure 4: An overview of our method, divided into textual feature extraction, visual feature extraction, and similarity matching.

BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval

TL;DR

Abstract

BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval

Authors

TL;DR

Abstract

Table of Contents

Figures (5)