Table of Contents
Fetching ...

Generative Medical Image Anonymization Based on Latent Code Projection and Optimization

Huiyu Li, Nicholas Ayache, Hervé Delingette

TL;DR

This work tackles medical image anonymization by introducing a two-stage framework that first projects real images into a latent space using an AE-GAN with a co-training scheme, then optimizes the latent code via two deep losses $L_{id}$ and $L_{ut}$ to balance identity removal with diagnostic utility. The anonymized latent code $W_A$, initialized from the projection latent $W$, is refined to obscure patient identity while preserving clinically relevant features, with the Protean losses defined as $L_{id}(X_R,X_A)=\max(0, \cos(\mathcal{E}_{id}(X_R),\mathcal{E}_{id}(X_A)) - m)$ and $L_{ut}(X_R,X_A)=\|\mathcal{E}_{ut}(X_R)-\mathcal{E}_{ut}(X_A)\|_2$. Evaluations on the MIMIC-CXR dataset show superior reconstruction fidelity and utility preservation under co-training compared to $\mathcal{E}$-training, alongside reduced identity leakage and robustness against membership inference attacks. The approach enables generating anonymized synthetic datasets suitable for training downstream lung pathology detectors, balancing privacy with data utility in a practical radiology setting.

Abstract

Medical image anonymization aims to protect patient privacy by removing identifying information, while preserving the data utility to solve downstream tasks. In this paper, we address the medical image anonymization problem with a two-stage solution: latent code projection and optimization. In the projection stage, we design a streamlined encoder to project input images into a latent space and propose a co-training scheme to enhance the projection process. In the optimization stage, we refine the latent code using two deep loss functions designed to address the trade-off between identity protection and data utility dedicated to medical images. Through a comprehensive set of qualitative and quantitative experiments, we showcase the effectiveness of our approach on the MIMIC-CXR chest X-ray dataset by generating anonymized synthetic images that can serve as training set for detecting lung pathologies. Source codes are available at https://github.com/Huiyu-Li/GMIA.

Generative Medical Image Anonymization Based on Latent Code Projection and Optimization

TL;DR

This work tackles medical image anonymization by introducing a two-stage framework that first projects real images into a latent space using an AE-GAN with a co-training scheme, then optimizes the latent code via two deep losses and to balance identity removal with diagnostic utility. The anonymized latent code , initialized from the projection latent , is refined to obscure patient identity while preserving clinically relevant features, with the Protean losses defined as and . Evaluations on the MIMIC-CXR dataset show superior reconstruction fidelity and utility preservation under co-training compared to -training, alongside reduced identity leakage and robustness against membership inference attacks. The approach enables generating anonymized synthetic datasets suitable for training downstream lung pathology detectors, balancing privacy with data utility in a practical radiology setting.

Abstract

Medical image anonymization aims to protect patient privacy by removing identifying information, while preserving the data utility to solve downstream tasks. In this paper, we address the medical image anonymization problem with a two-stage solution: latent code projection and optimization. In the projection stage, we design a streamlined encoder to project input images into a latent space and propose a co-training scheme to enhance the projection process. In the optimization stage, we refine the latent code using two deep loss functions designed to address the trade-off between identity protection and data utility dedicated to medical images. Through a comprehensive set of qualitative and quantitative experiments, we showcase the effectiveness of our approach on the MIMIC-CXR chest X-ray dataset by generating anonymized synthetic images that can serve as training set for detecting lung pathologies. Source codes are available at https://github.com/Huiyu-Li/GMIA.
Paper Structure (7 sections, 2 equations, 3 figures, 4 tables)

This paper contains 7 sections, 2 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Overview of the proposed method, consisting of two key stages: (1) AE-GAN network for latent code projection, and (2) Latent code optimization using identity removal loss and utility-preserving losss.
  • Figure 2: Reconstruction results. The first row displays the real images $X_R$. The last two rows show the reconstructed images $\hat{X}_R$ produced by the proposed co-training scheme and the $\mathcal{E}$-training scheme, respectively.
  • Figure 3: Anonymization results. Real images $X_R$ randomly selected from the training, validation, and test sets are displayed in the first column. The corresponding reconstructed images $\hat{X}_R$ are displayed in the second column. The anonymized images $X_A$ are displayed in the last three columns.