Latent Diffusion Models for Attribute-Preserving Image Anonymization

Luca Piano; Pietro Basci; Fabrizio Lamberti; Lia Morra

Latent Diffusion Models for Attribute-Preserving Image Anonymization

Luca Piano, Pietro Basci, Fabrizio Lamberti, Lia Morra

TL;DR

This paper presents the first approach to image anonymization based on Latent Diffusion Models (LDMs), and shows through extensive experimental comparison that the proposed method is competitive with the state-of-the-art concerning identity obfuscation whilst better preserving the original content of the image and tackling unresolved challenges that current solutions fail to address.

Abstract

Generative techniques for image anonymization have great potential to generate datasets that protect the privacy of those depicted in the images, while achieving high data fidelity and utility. Existing methods have focused extensively on preserving facial attributes, but failed to embrace a more comprehensive perspective that considers the scene and background into the anonymization process. This paper presents, to the best of our knowledge, the first approach to image anonymization based on Latent Diffusion Models (LDMs). Every element of a scene is maintained to convey the same meaning, yet manipulated in a way that makes re-identification difficult. We propose two LDMs for this purpose: CAMOUFLaGE-Base exploits a combination of pre-trained ControlNets, and a new controlling mechanism designed to increase the distance between the real and anonymized images. CAMOFULaGE-Light is based on the Adapter technique, coupled with an encoding designed to efficiently represent the attributes of different persons in a scene. The former solution achieves superior performance on most metrics and benchmarks, while the latter cuts the inference time in half at the cost of fine-tuning a lightweight module. We show through extensive experimental comparison that the proposed method is competitive with the state-of-the-art concerning identity obfuscation whilst better preserving the original content of the image and tackling unresolved challenges that current solutions fail to address.

Latent Diffusion Models for Attribute-Preserving Image Anonymization

TL;DR

Abstract

Paper Structure (26 sections, 1 equation, 6 figures, 3 tables)

This paper contains 26 sections, 1 equation, 6 figures, 3 tables.

Introduction
Related work
Image anonymization
Conditional image synthesis
Methodology
CAMOUFLaGE-Base
Information extraction, spatial and caption conditioning
Anonymization guidance
Reconstruction and sampling
CAMOUFLaGE-Light
Information extraction and training
Reconstruction, sampling and identity swapping
Experiments
Datasets
State of the art comparison
...and 11 more sections

Figures (6)

Figure 1: Comparison of the proposed method to DeepPrivacy2hukkelaas2023deepprivacy2 and FALCObarattin2023attribute in terms of identity anonymization, face attribute preservation, scene preservation and background anonymization.
Figure 2: Left: architecture of CAMOUFLaGE-Base. The annotators extract five different types of spatial information and a caption from the input image, which are used to guide Stable Diffusion at inference time via a ControlNet module as positive control. The diffusion process starts from the encoding of the original image in the latent space, after adding some noise. An additional ControlNet, taking as input directly the Variational Autoencoder (VAE) encoding and acting as the identity function (IDF), is used as negative control through a classifier-free guidance-like mechanism, controlled by an anonymization scale parameter $a_s$. All components are pretrained and no fine-tuning is needed, except for the IDF component. Right: Anonymized images at varying anonymization scales $a_s$. Between $a_s=0$ and $a_s=1.0$ CAMOUFLaGE interpolates between the original image and the image synthesized from the positive controls. At $a_s>1.0$ the negative control is enabled further pushing the output away from the original image within the limits afforded by the positive controls.
Figure 3: Architecture of CAMOUFLaGE-Light. IP-Adapter is fine-tuned to condition the image generation process by means of a decoupled cross-attention layer taking as input an image embedding (extracted from the FaRL visual encoder, to enhance control on visual features) and the caption. The caption is used only at training time to aid in encoding key scene characteristics. To prevent features from different persons from mixing, an ad-hoc encoding was designed to spatially encode 40 facial attributes, extracted from the FACER pre-trained model, and facial keypoints for each individual person. This encoding is given as input to a T2I-Adapter module, jointly trained with the IP-Adapter module. During the training, the SD module and the various encoders are kept frozen, while the IP-Adapter and T2I-Adapter module are trainable.
Figure 4: Anonymization results on CelebA-HQ karras2017progressive in comparison with DeepPrivacy2 (DP2) hukkelas23DP2 and FALCO barattin2023attribute.
Figure 5: Anonymization results on LFW huang2008labeled in comparison with DeepPrivacy2 (DP2) hukkelas23DP2 and FALCO barattin2023attribute.
...and 1 more figures

Latent Diffusion Models for Attribute-Preserving Image Anonymization

TL;DR

Abstract

Latent Diffusion Models for Attribute-Preserving Image Anonymization

Authors

TL;DR

Abstract

Table of Contents

Figures (6)