Few-Shot Fingerprinting Subject Re-Identification in 3D-MRI and 2D-X-Ray
Gonçalo Gaspar Alves, Shekoufeh Gorgi Zadeh, Andreas Husch, Ben Bausch
TL;DR
This paper tackles data leakage in multi-dataset medical imaging by introducing subject fingerprinting, a few-shot, metric-learning approach that maps all images of a subject to a coherent latent representation. Using 2D and 3D ResNet-50 backbones trained with triplet margin loss and in-batch hard negative mining, the method achieves high recall in N-way K-shot retrieval on ChestXray-14 and BraTS-2021, including challenging settings. The results demonstrate strong within-subject clustering and between-subject separation, with visualization and clustering metrics supporting the approach. This work advances cross-modal and 3D subject fingerprinting as a practical tool for detecting near-duplicates and mitigating data leakage in medical image datasets.
Abstract
Combining open-source datasets can introduce data leakage if the same subject appears in multiple sets, leading to inflated model performance. To address this, we explore subject fingerprinting, mapping all images of a subject to a distinct region in latent space, to enable subject re-identification via similarity matching. Using a ResNet-50 trained with triplet margin loss, we evaluate few-shot fingerprinting on 3D MRI and 2D X-ray data in both standard (20-way 1-shot) and challenging (1000-way 1-shot) scenarios. The model achieves high Mean- Recall-@-K scores: 99.10% (20-way 1-shot) and 90.06% (500-way 5-shot) on ChestXray-14; 99.20% (20-way 1-shot) and 98.86% (100-way 3-shot) on BraTS- 2021.
