Generative Medical Image Anonymization Based on Latent Code Projection and Optimization
Huiyu Li, Nicholas Ayache, Hervé Delingette
TL;DR
This work tackles medical image anonymization by introducing a two-stage framework that first projects real images into a latent space using an AE-GAN with a co-training scheme, then optimizes the latent code via two deep losses $L_{id}$ and $L_{ut}$ to balance identity removal with diagnostic utility. The anonymized latent code $W_A$, initialized from the projection latent $W$, is refined to obscure patient identity while preserving clinically relevant features, with the Protean losses defined as $L_{id}(X_R,X_A)=\max(0, \cos(\mathcal{E}_{id}(X_R),\mathcal{E}_{id}(X_A)) - m)$ and $L_{ut}(X_R,X_A)=\|\mathcal{E}_{ut}(X_R)-\mathcal{E}_{ut}(X_A)\|_2$. Evaluations on the MIMIC-CXR dataset show superior reconstruction fidelity and utility preservation under co-training compared to $\mathcal{E}$-training, alongside reduced identity leakage and robustness against membership inference attacks. The approach enables generating anonymized synthetic datasets suitable for training downstream lung pathology detectors, balancing privacy with data utility in a practical radiology setting.
Abstract
Medical image anonymization aims to protect patient privacy by removing identifying information, while preserving the data utility to solve downstream tasks. In this paper, we address the medical image anonymization problem with a two-stage solution: latent code projection and optimization. In the projection stage, we design a streamlined encoder to project input images into a latent space and propose a co-training scheme to enhance the projection process. In the optimization stage, we refine the latent code using two deep loss functions designed to address the trade-off between identity protection and data utility dedicated to medical images. Through a comprehensive set of qualitative and quantitative experiments, we showcase the effectiveness of our approach on the MIMIC-CXR chest X-ray dataset by generating anonymized synthetic images that can serve as training set for detecting lung pathologies. Source codes are available at https://github.com/Huiyu-Li/GMIA.
