Table of Contents
Fetching ...

NullFace: Training-Free Localized Face Anonymization

Han-Wei Kung, Tuomas Varanka, Terence Sim, Nicu Sebe

TL;DR

NullFace introduces a training-free face anonymization method that preserves non-identity attributes by inverting a pre-trained diffusion model to recover the initial noise and then applying an identity-embedding that negates the original identity during denoising. By combining DDPM inversion with an IP-Adapter conditioned on negated identity embeddings and a dual-path (conditional/unconditional) denoising framework, the approach achieves effective anonymization while maintaining gaze, pose, and expressions. The method supports localized anonymization via segmentation masks, enabling selective privacy control valuable in medical and behavioral research. Empirical results on CelebA-HQ and FFHQ show competitive re-identification reduction, strong attribute preservation, and high image quality, with ablations highlighting the critical role of inversion and the tunability through $T_{ ext{skip}}$, $\\lambda_{id}$, and $\lambda_{cfg}$. Overall, NullFace offers a practical, training-free, privacy-preserving solution with real-world applicability and robust resistance to identity recovery attacks.

Abstract

Privacy concerns around ever increasing number of cameras are increasing in today's digital age. Although existing anonymization methods are able to obscure identity information, they often struggle to preserve the utility of the images. In this work, we introduce a training-free method for face anonymization that preserves key non-identity-related attributes. Our approach utilizes a pre-trained text-to-image diffusion model without requiring optimization or training. It begins by inverting the input image to recover its initial noise. The noise is then denoised through an identity-conditioned diffusion process, where modified identity embeddings ensure the anonymized face is distinct from the original identity. Our approach also supports localized anonymization, giving users control over which facial regions are anonymized or kept intact. Comprehensive evaluations against state-of-the-art methods show our approach excels in anonymization, attribute preservation, and image quality. Its flexibility, robustness, and practicality make it well-suited for real-world applications. Code and data can be found at https://github.com/hanweikung/nullface .

NullFace: Training-Free Localized Face Anonymization

TL;DR

NullFace introduces a training-free face anonymization method that preserves non-identity attributes by inverting a pre-trained diffusion model to recover the initial noise and then applying an identity-embedding that negates the original identity during denoising. By combining DDPM inversion with an IP-Adapter conditioned on negated identity embeddings and a dual-path (conditional/unconditional) denoising framework, the approach achieves effective anonymization while maintaining gaze, pose, and expressions. The method supports localized anonymization via segmentation masks, enabling selective privacy control valuable in medical and behavioral research. Empirical results on CelebA-HQ and FFHQ show competitive re-identification reduction, strong attribute preservation, and high image quality, with ablations highlighting the critical role of inversion and the tunability through , , and . Overall, NullFace offers a practical, training-free, privacy-preserving solution with real-world applicability and robust resistance to identity recovery attacks.

Abstract

Privacy concerns around ever increasing number of cameras are increasing in today's digital age. Although existing anonymization methods are able to obscure identity information, they often struggle to preserve the utility of the images. In this work, we introduce a training-free method for face anonymization that preserves key non-identity-related attributes. Our approach utilizes a pre-trained text-to-image diffusion model without requiring optimization or training. It begins by inverting the input image to recover its initial noise. The noise is then denoised through an identity-conditioned diffusion process, where modified identity embeddings ensure the anonymized face is distinct from the original identity. Our approach also supports localized anonymization, giving users control over which facial regions are anonymized or kept intact. Comprehensive evaluations against state-of-the-art methods show our approach excels in anonymization, attribute preservation, and image quality. Its flexibility, robustness, and practicality make it well-suited for real-world applications. Code and data can be found at https://github.com/hanweikung/nullface .

Paper Structure

This paper contains 32 sections, 4 equations, 22 figures, 3 tables.

Figures (22)

  • Figure 1: Our method obscures identity while preserving attributes such as gaze, expressions, and head pose (in contrast to Stable Diffusion Inpainting rombach2022high) and enables selective anonymization of specific facial regions.
  • Figure 2: Face anonymization pipeline using diffusion model inversion. Starting with an input facial image, we perform DDPM inversion huberman2024edit to retrieve the initial noise map $x_T$ and a sequence of noise maps $\{z_t\}$ from the diffusion process. Face embeddings are extracted using a face recognition model deng2019arcface and negated with a hyperparameter $\lambda_{id}$, creating negative identity guides. These guides steer the model away from reconstructing the original identity during denoising. The denoising process begins with $x_T$, combining conditional and unconditional paths. The conditional path utilizes negated identity embeddings to obscure identifiable features, while the unconditional path uses null embeddings ($\varnothing$) to preserve non-identifying attributes. Outputs from both paths are merged using a guidance scale parameter $\lambda_{cfg}$ through \ref{['eq:cfg']}. Lastly, optional masks can be applied at each iteration to control which facial features are anonymized or retained, enabling localized anonymization.
  • Figure 3: As $T_{\text{skip}}$ increases from 0 to higher values, the generated image progressively aligns more closely with the input, ultimately achieving near-perfect reconstruction.
  • Figure 4: Increasing $\lambda_{id}$ generates faces that are less similar to the original, with FaceNet schroff2015facenet identity distance values shown for each example.
  • Figure 5: As the guidance scale increases, the anonymized identities become increasingly distinct from the originals, as confirmed by identity distance measurements using FaceNet schroff2015facenet. However, the version generated with a guidance scale of 20 reveals that excessively high guidance scales, while widening identity distance, compromise the photorealism of the resulting images.
  • ...and 17 more figures