Content-Based Image Retrieval for Multi-Class Volumetric Radiology Images: A Benchmark Study
Farnaz Khun Jush, Steffen Vogler, Tuan Truong, Matthias Lenga
TL;DR
This work addresses the challenge of content-based image retrieval for 3D volumetric radiology images by introducing a benchmark based on the TotalSegmentator dataset, enabling region-based and localized multi-organ retrieval. It combines a scalable vector-indexing pipeline with diverse 2D slice embeddings and a ColBERT-inspired late interaction re-ranking to improve volumetric recalls. Across 29 coarse and 104 fine anatomical structures, pre-trained embeddings from self-supervised sources and DreamSim-based ensembles achieve high recall, with region-based and localized retrieval approaching near-perfect accuracy in many cases, and re-ranking providing notable gains. The study demonstrates the feasibility and utility of a standardized CBIR benchmark for medical imaging, highlights the value of re-ranking for context-aware search, and offers guidance on embedding choices and evaluation metrics for real-world clinical retrieval tasks.
Abstract
While content-based image retrieval (CBIR) has been extensively studied in natural image retrieval, its application to medical images presents ongoing challenges, primarily due to the 3D nature of medical images. Recent studies have shown the potential use of pre-trained vision embeddings for CBIR in the context of radiology image retrieval. However, a benchmark for the retrieval of 3D volumetric medical images is still lacking, hindering the ability to objectively evaluate and compare the efficiency of proposed CBIR approaches in medical imaging. In this study, we extend previous work and establish a benchmark for region-based and localized multi-organ retrieval using the TotalSegmentator dataset (TS) with detailed multi-organ annotations. We benchmark embeddings derived from pre-trained supervised models on medical images against embeddings derived from pre-trained unsupervised models on non-medical images for 29 coarse and 104 detailed anatomical structures in volume and region levels. For volumetric image retrieval, we adopt a late interaction re-ranking method inspired by text matching. We compare it against the original method proposed for volume and region retrieval and achieve a retrieval recall of 1.0 for diverse anatomical regions with a wide size range. The findings and methodologies presented in this paper provide insights and benchmarks for further development and evaluation of CBIR approaches in the context of medical imaging.
