Table of Contents
Fetching ...

A Calibrated Memorization Index (MI) for Detecting Training Data Leakage in Generative MRI Models

Yash Deo, Yan Jia, Toni Lassila, Victoria J Hodge, Alejandro F Frang, Chenghao Qian, Siyuan Kang, Ibrahim Habli

TL;DR

The paper addresses privacy risks from memorization in generative MRI models and shows that standard fidelity metrics can be misleading when data leakage occurs. It introduces a calibrated Memorization Index (MI) and Overfit/Novelty Index (ONI) built from MRI-domain features using a multi-scale, transformer-based feature space with per-layer whitening and a geometric mean aggregation, and calibrates these against an empirical null via $MI_j=\frac{s_j-\mu_{\text{null}}}{\sigma_{\text{null}}}$ and $ONI_j=-\tanh(MI_j)$. The approach yields monotonic, augmentation-stable detection at the set level and reliable per-sample scores across multiple MRI datasets, enabling practical data curation even when fidelity metrics suggest improved quality. This framework can be extended to other medical imaging domains by adopting appropriate domain-specific feature spaces.

Abstract

Image generative models are known to duplicate images from the training data as part of their outputs, which can lead to privacy concerns when used for medical image generation. We propose a calibrated per-sample metric for detecting memorization and duplication of training data. Our metric uses image features extracted using an MRI foundation model, aggregates multi-layer whitened nearest-neighbor similarities, and maps them to a bounded \emph{Overfit/Novelty Index} (ONI) and \emph{Memorization Index} (MI) scores. Across three MRI datasets with controlled duplication percentages and typical image augmentations, our metric robustly detects duplication and provides more consistent metric values across datasets. At the sample level, our metric achieves near-perfect detection of duplicates.

A Calibrated Memorization Index (MI) for Detecting Training Data Leakage in Generative MRI Models

TL;DR

The paper addresses privacy risks from memorization in generative MRI models and shows that standard fidelity metrics can be misleading when data leakage occurs. It introduces a calibrated Memorization Index (MI) and Overfit/Novelty Index (ONI) built from MRI-domain features using a multi-scale, transformer-based feature space with per-layer whitening and a geometric mean aggregation, and calibrates these against an empirical null via and . The approach yields monotonic, augmentation-stable detection at the set level and reliable per-sample scores across multiple MRI datasets, enabling practical data curation even when fidelity metrics suggest improved quality. This framework can be extended to other medical imaging domains by adopting appropriate domain-specific feature spaces.

Abstract

Image generative models are known to duplicate images from the training data as part of their outputs, which can lead to privacy concerns when used for medical image generation. We propose a calibrated per-sample metric for detecting memorization and duplication of training data. Our metric uses image features extracted using an MRI foundation model, aggregates multi-layer whitened nearest-neighbor similarities, and maps them to a bounded \emph{Overfit/Novelty Index} (ONI) and \emph{Memorization Index} (MI) scores. Across three MRI datasets with controlled duplication percentages and typical image augmentations, our metric robustly detects duplication and provides more consistent metric values across datasets. At the sample level, our metric achieves near-perfect detection of duplicates.
Paper Structure (10 sections, 6 equations, 2 figures, 4 tables)

This paper contains 10 sections, 6 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Overview of our methodology to calculate the Memorisation Index (MI) and the Overfit-Novelty Index (ONI)
  • Figure 2: Metric response to duplication under augmentations. (a) MI increases near-linearly and remains tight across augmentations. (b) CT spreads across augmentations, especially at higher duplication percentages. (c--d) FID/MMD decrease as duplication increases, which can be misleading if used as the only quality metric. (e) AuthPct is highly augmentation-sensitive. (f) Vendi shows little/no signal.