Beyond Inference Intervention: Identity-Decoupled Diffusion for Face Anonymization
Haoxin Yang, Yihong Lin, Jingdan Kang, Xuemiao Xu, Yue Li, Cheng Xu, Shengfeng He
TL;DR
ID2Face addresses biometric privacy by eliminating reliance on inference-time interventions for face anonymization. It trains a conditional diffusion model with identity masking to learn an identity-decoupled latent space, using the Identity-Decoupled Latent Recomposer and Identity-Guided Latent Harmonizer to disentangle and fuse identity and non-identity information. An Orthogonal Identity Mapping strategy enforces orthogonality between source and anonymized identities, increasing anonymization strength while preserving utility. Comprehensive experiments on CelebA-HQ and FFHQ show superior identity removal, high visual fidelity, and robust privacy protection with rich identity diversity, highlighting the practical impact for privacy-preserving vision tasks.
Abstract
Face anonymization aims to conceal identity information while preserving non-identity attributes. Mainstream diffusion models rely on inference-time interventions such as negative guidance or energy-based optimization, which are applied post-training to suppress identity features. These interventions often introduce distribution shifts and entangle identity with non-identity attributes, degrading visual fidelity and data utility. To address this, we propose \textbf{ID\textsuperscript{2}Face}, a training-centric anonymization framework that removes the need for inference-time optimization. The rationale of our method is to learn a structured latent space where identity and non-identity information are explicitly disentangled, enabling direct and controllable anonymization at inference. To this end, we design a conditional diffusion model with an identity-masked learning scheme. An Identity-Decoupled Latent Recomposer uses an Identity Variational Autoencoder to model identity features, while non-identity attributes are extracted from same-identity pairs and aligned through bidirectional latent alignment. An Identity-Guided Latent Harmonizer then fuses these representations via soft-gating conditioned on noisy feature prediction. The model is trained with a recomposition-based reconstruction loss to enforce disentanglement. At inference, anonymization is achieved by sampling a random identity vector from the learned identity space. To further suppress identity leakage, we introduce an Orthogonal Identity Mapping strategy that enforces orthogonality between sampled and source identity vectors. Experiments demonstrate that ID\textsuperscript{2}Face outperforms existing methods in visual quality, identity suppression, and utility preservation.
