Table of Contents
Fetching ...

Unsafe2Safe: Controllable Image Anonymization for Downstream Utility

Mih Dinh, SouYoung Jin

Abstract

Large-scale image datasets frequently contain identifiable or sensitive content, raising privacy risks when training models that may memorize and leak such information. We present Unsafe2Safe, a fully automated pipeline that detects privacy-prone images and rewrites only their sensitive regions using multimodally guided diffusion editing. Unsafe2Safe operates in two stages. Stage 1 uses a vision-language model to (i) inspect images for privacy risks, (ii) generate paired private and public captions that respectively include and omit sensitive attributes, and (iii) prompt a large language model to produce structured, identity-neutral edit instructions conditioned on the public caption. Stage 2 employs instruction-driven diffusion editors to apply these dual textual prompts, producing privacy-safe images that preserve global structure and task-relevant semantics while neutralizing private content. To measure anonymization quality, we introduce a unified evaluation suite covering Quality, Cheating, Privacy, and Utility dimensions. Across MS-COCO, Caltech101, and MIT Indoor67, Unsafe2Safe reduces face similarity, text similarity, and demographic predictability by large margins, while maintaining downstream model accuracy comparable to training on raw data. Fine-tuning diffusion editors on our automatically generated triplets (private caption, public caption, edit instruction) further improves both privacy protection and semantic fidelity. Unsafe2Safe provides a scalable, principled solution for constructing large, privacy-safe datasets without sacrificing visual consistency or downstream utility.

Unsafe2Safe: Controllable Image Anonymization for Downstream Utility

Abstract

Large-scale image datasets frequently contain identifiable or sensitive content, raising privacy risks when training models that may memorize and leak such information. We present Unsafe2Safe, a fully automated pipeline that detects privacy-prone images and rewrites only their sensitive regions using multimodally guided diffusion editing. Unsafe2Safe operates in two stages. Stage 1 uses a vision-language model to (i) inspect images for privacy risks, (ii) generate paired private and public captions that respectively include and omit sensitive attributes, and (iii) prompt a large language model to produce structured, identity-neutral edit instructions conditioned on the public caption. Stage 2 employs instruction-driven diffusion editors to apply these dual textual prompts, producing privacy-safe images that preserve global structure and task-relevant semantics while neutralizing private content. To measure anonymization quality, we introduce a unified evaluation suite covering Quality, Cheating, Privacy, and Utility dimensions. Across MS-COCO, Caltech101, and MIT Indoor67, Unsafe2Safe reduces face similarity, text similarity, and demographic predictability by large margins, while maintaining downstream model accuracy comparable to training on raw data. Fine-tuning diffusion editors on our automatically generated triplets (private caption, public caption, edit instruction) further improves both privacy protection and semantic fidelity. Unsafe2Safe provides a scalable, principled solution for constructing large, privacy-safe datasets without sacrificing visual consistency or downstream utility.

Paper Structure

This paper contains 54 sections, 1 equation, 20 figures, 10 tables.

Figures (20)

  • Figure 1: Examples from Unsafe2Safe (U2S). For each case, the model converts an unsafe image into a privacy-preserving safe version. Examples demonstrate key capabilities that may appear simultaneously: (1) structure-preserving full body anonymization, (2) demographic neutralization (race entropy $\uparrow$), and (3) obfuscation of non-human confidential details.
  • Figure 2: Pipeline Overview. A VLM inspects the image for privacy risks. For flagged images, it generates a private caption $c^{priv}$ and a public caption $c^{pub}$ without sensitive details. An LLM then produces an edit instruction $c^{edit}$ on how sensitive attributes should be modified. A diffusion editor uses these priors to generate a privacy-safe image while preserving scene semantics.
  • Figure 3: SafeAttention within UNet. The UNet transformer receives two textual conditions: the edit instruction and the public caption. The Cross Attention module follows the standard cross-attention pathway, while an auxiliary Safe Attention module operates on both embeddings to reinforce non-private semantics during denoising.
  • Figure 4: Qualitative comparison of anonymization outputs on Caltech101. Each image shows with a different model family (top line) and its textual condition (bottom line). All methods preserve the global layout of the original scene, but the Unsafe2Safe using FlowEdit kulikov2025flowedit, unlike face-only anonymizers (DP2, FaceAnon), modify background elements when they contain privacy-relevant cues, while keeping overall scene composition intact.
  • Figure 5: Qualitative examples showing that the InstructPix2Pix model finetuned on our dataset (FT) more effectively anonymizes sensitive content while preserving original class semantics, compared to the model trained on general editing data.
  • ...and 15 more figures