Table of Contents
Fetching ...

A Novel Metric for Detecting Memorization in Generative Models for Brain MRI Synthesis

Antonio Scardace, Lemuel Puglisi, Francesco Guarnera, Sebastiano Battiato, Daniele Ravì

TL;DR

DeepSSIM introduces a self-supervised embedding-based metric to detect memorization in medical-image generative models, addressing privacy risks in brain MRI synthesis. By training embeddings to match ground-truth SSIM between image pairs, DeepSSIM yields a scalable measure s_θ(I_a,I_b) that remains robust to spatial misalignment. The memorization metric quantifies the fraction of training images with memorized duplicates in a large synthetic set, enabling high-throughput privacy auditing. Across brain MRI and chest X-ray experiments, DeepSSIM outperforms baselines in misaligned scenarios and offers substantial runtime advantages over SSIM, providing a practical tool for safeguarding patient confidentiality in synthetic medical data pipelines.

Abstract

Deep generative models have emerged as a transformative tool in medical imaging, offering substantial potential for synthetic data generation. However, recent empirical studies highlight a critical vulnerability: these models can memorize sensitive training data, posing significant risks of unauthorized patient information disclosure. Detecting memorization in generative models remains particularly challenging, necessitating scalable methods capable of identifying training data leakage across large sets of generated samples. In this work, we propose DeepSSIM, a novel self-supervised metric for quantifying memorization in generative models. DeepSSIM is trained to: i) project images into a learned embedding space and ii) force the cosine similarity between embeddings to match the ground-truth SSIM (Structural Similarity Index) scores computed in the image space. To capture domain-specific anatomical features, training incorporates structure-preserving augmentations, allowing DeepSSIM to estimate similarity reliably without requiring precise spatial alignment. We evaluate DeepSSIM in a case study involving synthetic brain MRI data generated by a Latent Diffusion Model (LDM) trained under memorization-prone conditions, using 2,195 MRI scans from two publicly available datasets (IXI and CoRR). Compared to state-of-the-art memorization metrics, DeepSSIM achieves superior performance, improving F1 scores by an average of +52.03% over the best existing method. Code and data of our approach are publicly available at the following link: https://github.com/brAIn-science/DeepSSIM.

A Novel Metric for Detecting Memorization in Generative Models for Brain MRI Synthesis

TL;DR

DeepSSIM introduces a self-supervised embedding-based metric to detect memorization in medical-image generative models, addressing privacy risks in brain MRI synthesis. By training embeddings to match ground-truth SSIM between image pairs, DeepSSIM yields a scalable measure s_θ(I_a,I_b) that remains robust to spatial misalignment. The memorization metric quantifies the fraction of training images with memorized duplicates in a large synthetic set, enabling high-throughput privacy auditing. Across brain MRI and chest X-ray experiments, DeepSSIM outperforms baselines in misaligned scenarios and offers substantial runtime advantages over SSIM, providing a practical tool for safeguarding patient confidentiality in synthetic medical data pipelines.

Abstract

Deep generative models have emerged as a transformative tool in medical imaging, offering substantial potential for synthetic data generation. However, recent empirical studies highlight a critical vulnerability: these models can memorize sensitive training data, posing significant risks of unauthorized patient information disclosure. Detecting memorization in generative models remains particularly challenging, necessitating scalable methods capable of identifying training data leakage across large sets of generated samples. In this work, we propose DeepSSIM, a novel self-supervised metric for quantifying memorization in generative models. DeepSSIM is trained to: i) project images into a learned embedding space and ii) force the cosine similarity between embeddings to match the ground-truth SSIM (Structural Similarity Index) scores computed in the image space. To capture domain-specific anatomical features, training incorporates structure-preserving augmentations, allowing DeepSSIM to estimate similarity reliably without requiring precise spatial alignment. We evaluate DeepSSIM in a case study involving synthetic brain MRI data generated by a Latent Diffusion Model (LDM) trained under memorization-prone conditions, using 2,195 MRI scans from two publicly available datasets (IXI and CoRR). Compared to state-of-the-art memorization metrics, DeepSSIM achieves superior performance, improving F1 scores by an average of +52.03% over the best existing method. Code and data of our approach are publicly available at the following link: https://github.com/brAIn-science/DeepSSIM.

Paper Structure

This paper contains 25 sections, 5 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: The figure illustrates the DeepSSIM self-supervised training process, which is divided as follows: (i) Inputs: real and synthetic MRI scans are first preprocessed. (ii) Augmentation: random augmentation to the images is used before feature extraction to make the process robust to image variation. (iii) Embeddings: images are mapped into a lower-dimensional embedding space using a feature extractor $f_\theta$. (iv) Model Optimization: the cosine similarity between the embeddings is directly compared with the ground-truth SSIM to compute the loss and optimize the parameter $\theta$ of our model.
  • Figure 2: Example from the Brain MRI dataset. The figure shows a real training image alongside three synthetic counterparts generated by the LDM and manually labeled by our experts.
  • Figure 3: The histograms show the distribution of similarity scores for image pairs in the Brain MRI dataset, separated by class, across all competing methods. The vertical lines represent the classification thresholds applied by each method.
  • Figure 4: Example from the Chest X-ray dataset. The figure shows a real training image alongside three synthetic counterparts generated by the LDM and manually labeled by our experts.
  • Figure 5: The histograms show the distribution of similarity scores for image pairs in the Chest X-ray dataset, separated by class, across all competing methods. The vertical lines represent the classification thresholds applied by each method.
  • ...and 1 more figures