Table of Contents
Fetching ...

Beyond Inference Intervention: Identity-Decoupled Diffusion for Face Anonymization

Haoxin Yang, Yihong Lin, Jingdan Kang, Xuemiao Xu, Yue Li, Cheng Xu, Shengfeng He

TL;DR

ID2Face addresses biometric privacy by eliminating reliance on inference-time interventions for face anonymization. It trains a conditional diffusion model with identity masking to learn an identity-decoupled latent space, using the Identity-Decoupled Latent Recomposer and Identity-Guided Latent Harmonizer to disentangle and fuse identity and non-identity information. An Orthogonal Identity Mapping strategy enforces orthogonality between source and anonymized identities, increasing anonymization strength while preserving utility. Comprehensive experiments on CelebA-HQ and FFHQ show superior identity removal, high visual fidelity, and robust privacy protection with rich identity diversity, highlighting the practical impact for privacy-preserving vision tasks.

Abstract

Face anonymization aims to conceal identity information while preserving non-identity attributes. Mainstream diffusion models rely on inference-time interventions such as negative guidance or energy-based optimization, which are applied post-training to suppress identity features. These interventions often introduce distribution shifts and entangle identity with non-identity attributes, degrading visual fidelity and data utility. To address this, we propose \textbf{ID\textsuperscript{2}Face}, a training-centric anonymization framework that removes the need for inference-time optimization. The rationale of our method is to learn a structured latent space where identity and non-identity information are explicitly disentangled, enabling direct and controllable anonymization at inference. To this end, we design a conditional diffusion model with an identity-masked learning scheme. An Identity-Decoupled Latent Recomposer uses an Identity Variational Autoencoder to model identity features, while non-identity attributes are extracted from same-identity pairs and aligned through bidirectional latent alignment. An Identity-Guided Latent Harmonizer then fuses these representations via soft-gating conditioned on noisy feature prediction. The model is trained with a recomposition-based reconstruction loss to enforce disentanglement. At inference, anonymization is achieved by sampling a random identity vector from the learned identity space. To further suppress identity leakage, we introduce an Orthogonal Identity Mapping strategy that enforces orthogonality between sampled and source identity vectors. Experiments demonstrate that ID\textsuperscript{2}Face outperforms existing methods in visual quality, identity suppression, and utility preservation.

Beyond Inference Intervention: Identity-Decoupled Diffusion for Face Anonymization

TL;DR

ID2Face addresses biometric privacy by eliminating reliance on inference-time interventions for face anonymization. It trains a conditional diffusion model with identity masking to learn an identity-decoupled latent space, using the Identity-Decoupled Latent Recomposer and Identity-Guided Latent Harmonizer to disentangle and fuse identity and non-identity information. An Orthogonal Identity Mapping strategy enforces orthogonality between source and anonymized identities, increasing anonymization strength while preserving utility. Comprehensive experiments on CelebA-HQ and FFHQ show superior identity removal, high visual fidelity, and robust privacy protection with rich identity diversity, highlighting the practical impact for privacy-preserving vision tasks.

Abstract

Face anonymization aims to conceal identity information while preserving non-identity attributes. Mainstream diffusion models rely on inference-time interventions such as negative guidance or energy-based optimization, which are applied post-training to suppress identity features. These interventions often introduce distribution shifts and entangle identity with non-identity attributes, degrading visual fidelity and data utility. To address this, we propose \textbf{ID\textsuperscript{2}Face}, a training-centric anonymization framework that removes the need for inference-time optimization. The rationale of our method is to learn a structured latent space where identity and non-identity information are explicitly disentangled, enabling direct and controllable anonymization at inference. To this end, we design a conditional diffusion model with an identity-masked learning scheme. An Identity-Decoupled Latent Recomposer uses an Identity Variational Autoencoder to model identity features, while non-identity attributes are extracted from same-identity pairs and aligned through bidirectional latent alignment. An Identity-Guided Latent Harmonizer then fuses these representations via soft-gating conditioned on noisy feature prediction. The model is trained with a recomposition-based reconstruction loss to enforce disentanglement. At inference, anonymization is achieved by sampling a random identity vector from the learned identity space. To further suppress identity leakage, we introduce an Orthogonal Identity Mapping strategy that enforces orthogonality between sampled and source identity vectors. Experiments demonstrate that ID\textsuperscript{2}Face outperforms existing methods in visual quality, identity suppression, and utility preservation.

Paper Structure

This paper contains 39 sections, 42 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: (a) Existing methods rely on inference-time intervention to erase identity, often resulting in suboptimal anonymization and distortion of non-identity features. (b) ID2Face introduces an inference-time-intervention-free framework that disentangles and harmonizes identity and non-identity features, achieving superior anonymization while preserving identity-irrelevant attributes.
  • Figure 2: Overview of the proposed ID2Face framework. The model learns an identity-decoupled latent space via identity-masked diffusion training, enabling anonymization without inference-time intervention. The Identity-Decoupled Latent Recomposer (IDLR) extracts identity vectors using an ID-VAE and recomposes them with non-identity cues from paired inputs with bidirectional alignment. The Identity-Guided Latent Harmonizer (IGLH) fuses these recomposed features via identity-guided soft-gating for fine-grained, spatially-aware control. At inference, a random identity vector is sampled from the learned space and constrained via Orthogonal Identity Mapping (OIM) to suppress identity leakage and maximize anonymization effectiveness.
  • Figure 3: Effectiveness of bidirectional latent alignment. (a) is the input image. (b) is the result of directly concatenating the non-id non-identity embedding $e_{\text{non-id}}$ and identity embedding $e_{\text{id}}$ to guide the diffusion model for face anonymization. (c) is our result. Simply concatenating $e_{\text{non-id}}$ and $e_{\text{id}}$ fail to preserve identity-independent low-level details.
  • Figure 4: Orthogonal sampling in the latent space of the ID-VAE. (a) is an obtuse angle case, while (b) is an acute angle case. Our method ensures that the sampled identity vector is always orthogonal to the original identity embedding, eliminating uncertainty from random alignment.
  • Figure 5: Qualitative comparison of face anonymization and recovery among different methods on the CelebA-HQ dataset CelebAHQ. (a)-(h) are the original face images, the anonymization results of RiDDLE RiDDLE, G$^2$Face yang2024g, AIDPro AIDPro, DiffPrivacy DiffPrivacy, FAMS FAMS, NullFace NullFace and our method, respectively. Note that RiDDLE, G$^2$Face, and AIDPro are GAN-based methods, while the others are diffusion-based methods. Our method achieves superior anonymization and image quality compared to the SOTA methods.
  • ...and 5 more figures