Table of Contents
Fetching ...

Few-Shot Fingerprinting Subject Re-Identification in 3D-MRI and 2D-X-Ray

Gonçalo Gaspar Alves, Shekoufeh Gorgi Zadeh, Andreas Husch, Ben Bausch

TL;DR

This paper tackles data leakage in multi-dataset medical imaging by introducing subject fingerprinting, a few-shot, metric-learning approach that maps all images of a subject to a coherent latent representation. Using 2D and 3D ResNet-50 backbones trained with triplet margin loss and in-batch hard negative mining, the method achieves high recall in N-way K-shot retrieval on ChestXray-14 and BraTS-2021, including challenging settings. The results demonstrate strong within-subject clustering and between-subject separation, with visualization and clustering metrics supporting the approach. This work advances cross-modal and 3D subject fingerprinting as a practical tool for detecting near-duplicates and mitigating data leakage in medical image datasets.

Abstract

Combining open-source datasets can introduce data leakage if the same subject appears in multiple sets, leading to inflated model performance. To address this, we explore subject fingerprinting, mapping all images of a subject to a distinct region in latent space, to enable subject re-identification via similarity matching. Using a ResNet-50 trained with triplet margin loss, we evaluate few-shot fingerprinting on 3D MRI and 2D X-ray data in both standard (20-way 1-shot) and challenging (1000-way 1-shot) scenarios. The model achieves high Mean- Recall-@-K scores: 99.10% (20-way 1-shot) and 90.06% (500-way 5-shot) on ChestXray-14; 99.20% (20-way 1-shot) and 98.86% (100-way 3-shot) on BraTS- 2021.

Few-Shot Fingerprinting Subject Re-Identification in 3D-MRI and 2D-X-Ray

TL;DR

This paper tackles data leakage in multi-dataset medical imaging by introducing subject fingerprinting, a few-shot, metric-learning approach that maps all images of a subject to a coherent latent representation. Using 2D and 3D ResNet-50 backbones trained with triplet margin loss and in-batch hard negative mining, the method achieves high recall in N-way K-shot retrieval on ChestXray-14 and BraTS-2021, including challenging settings. The results demonstrate strong within-subject clustering and between-subject separation, with visualization and clustering metrics supporting the approach. This work advances cross-modal and 3D subject fingerprinting as a practical tool for detecting near-duplicates and mitigating data leakage in medical image datasets.

Abstract

Combining open-source datasets can introduce data leakage if the same subject appears in multiple sets, leading to inflated model performance. To address this, we explore subject fingerprinting, mapping all images of a subject to a distinct region in latent space, to enable subject re-identification via similarity matching. Using a ResNet-50 trained with triplet margin loss, we evaluate few-shot fingerprinting on 3D MRI and 2D X-ray data in both standard (20-way 1-shot) and challenging (1000-way 1-shot) scenarios. The model achieves high Mean- Recall-@-K scores: 99.10% (20-way 1-shot) and 90.06% (500-way 5-shot) on ChestXray-14; 99.20% (20-way 1-shot) and 98.86% (100-way 3-shot) on BraTS- 2021.

Paper Structure

This paper contains 12 sections, 5 equations, 2 figures, 2 tables, 1 algorithm.

Figures (2)

  • Figure 1.1: T-SNE plots of 128-dimensional image embeddings from the ChestXray-14 (A) and the BraTS-2021 (B) test sets, highlighting a selection of subjects. (A & B left) Untrained ResNet-50 encodings, showing no clustering of subjects. (A & B right) ResNet-50 trained with triplet margin loss, showing clustered and well-separated subjects in latent space.
  • Figure 1.2: (A) Abstraction of the MIASD and MIESD. B) Binned Intra-Subject (Blue) and Inter-Subject Distances (Red) for all subjects in the ChestXray-14 dataset.