Table of Contents
Fetching ...

Beyond Anonymization: Object Scrubbing for Privacy-Preserving 2D and 3D Vision Tasks

Murat Bilgehan Ertan, Ronak Sahu, Phuong Ha Nguyen, Kaleel Mahmood, Marten van Dijk

TL;DR

ROAR presents a privacy-preserving dataset obfuscation framework that removes sensitive objects through instance segmentation and generative inpainting, preserving scene integrity for both 2D detection and 3D NeRF reconstruction. It combines Mask2Former detection, latent-diffusion or GAN-based inpainting, and an oracle-based re-annotation step to quantify privacy-utility trade-offs on COCO and NeRF datasets. Results show scrubbing maintains substantially higher utility than image dropping, achieving around 87.5% of baseline AP in 2D and at most a 1.66 dB PSNR loss in 3D reconstruction, with diffusion-based methods delivering superior perceptual quality. Overall, ROAR demonstrates that object removal can offer strong privacy guarantees with minimal impact on downstream tasks, while highlighting areas for segmentation robustness and cross-task generalization.

Abstract

We introduce ROAR (Robust Object Removal and Re-annotation), a scalable framework for privacy-preserving dataset obfuscation that eliminates sensitive objects instead of modifying them. Our method integrates instance segmentation with generative inpainting to remove identifiable entities while preserving scene integrity. Extensive evaluations on 2D COCO-based object detection show that ROAR achieves 87.5% of the baseline detection average precision (AP), whereas image dropping achieves only 74.2% of the baseline AP, highlighting the advantage of scrubbing in preserving dataset utility. The degradation is even more severe for small objects due to occlusion and loss of fine-grained details. Furthermore, in NeRF-based 3D reconstruction, our method incurs a PSNR loss of at most 1.66 dB while maintaining SSIM and improving LPIPS, demonstrating superior perceptual quality. Our findings establish object removal as an effective privacy framework, achieving strong privacy guarantees with minimal performance trade-offs. The results highlight key challenges in generative inpainting, occlusion-robust segmentation, and task-specific scrubbing, setting the foundation for future advancements in privacy-preserving vision systems.

Beyond Anonymization: Object Scrubbing for Privacy-Preserving 2D and 3D Vision Tasks

TL;DR

ROAR presents a privacy-preserving dataset obfuscation framework that removes sensitive objects through instance segmentation and generative inpainting, preserving scene integrity for both 2D detection and 3D NeRF reconstruction. It combines Mask2Former detection, latent-diffusion or GAN-based inpainting, and an oracle-based re-annotation step to quantify privacy-utility trade-offs on COCO and NeRF datasets. Results show scrubbing maintains substantially higher utility than image dropping, achieving around 87.5% of baseline AP in 2D and at most a 1.66 dB PSNR loss in 3D reconstruction, with diffusion-based methods delivering superior perceptual quality. Overall, ROAR demonstrates that object removal can offer strong privacy guarantees with minimal impact on downstream tasks, while highlighting areas for segmentation robustness and cross-task generalization.

Abstract

We introduce ROAR (Robust Object Removal and Re-annotation), a scalable framework for privacy-preserving dataset obfuscation that eliminates sensitive objects instead of modifying them. Our method integrates instance segmentation with generative inpainting to remove identifiable entities while preserving scene integrity. Extensive evaluations on 2D COCO-based object detection show that ROAR achieves 87.5% of the baseline detection average precision (AP), whereas image dropping achieves only 74.2% of the baseline AP, highlighting the advantage of scrubbing in preserving dataset utility. The degradation is even more severe for small objects due to occlusion and loss of fine-grained details. Furthermore, in NeRF-based 3D reconstruction, our method incurs a PSNR loss of at most 1.66 dB while maintaining SSIM and improving LPIPS, demonstrating superior perceptual quality. Our findings establish object removal as an effective privacy framework, achieving strong privacy guarantees with minimal performance trade-offs. The results highlight key challenges in generative inpainting, occlusion-robust segmentation, and task-specific scrubbing, setting the foundation for future advancements in privacy-preserving vision systems.

Paper Structure

This paper contains 39 sections, 23 equations, 17 figures, 4 tables.

Figures (17)

  • Figure 1: Privacy-preserving transformations for dataset obfuscation. The input image (top) contains sensitive objects, detected via instance segmentation (middle). Three different obfuscation strategies are applied: (left) DeepPrivacy2 hukkelaas2023deepprivacy2 anonymization, (middle) ours, which removes sensitive objects while maintaining scene integrity, and (right) full data deletion, which ensures maximum privacy at the cost of utility.
  • Figure 2: Privacy-preserving dataset obfuscation pipeline. Raw Dataset: Input data includes COCO lin2015microsoftcococommonobjects for 2D detection and NeRF scenes mildenhall2020nerf for 3D reconstruction. Sensitive Object Detection: Instance segmentation (e.g., Mask2Former cheng2022maskedattentionmasktransformeruniversal) identifies sensitive objects in 2D datasets, while NeRF-based datasets require manual selection. Obfuscation: Sensitive objects are removed using generative inpainting methods such as diffusion models (e.g., Stable Diffusion Rombach_2022_CVPRkandinsky2023) and GAN-based models (e.g., AOT-GAN zeng2021aggregatedcontextualtransformationshighresolution). Re-annotation: An oracle model (e.g., RT-DETRv2 lv2024rtdetrv2improvedbaselinebagoffreebies) updates labels post-obfuscation to maintain dataset integrity. Processed Dataset: The resulting dataset ensures privacy while preserving contextual integrity. Privacy & Utility Evaluation: Privacy is verified via an oracle, while utility is measured by training object detection (e.g., YOLOv9, RT-DETRv2) and 3D reconstruction (e.g., NeRF) models. Model Training and Comparison: Detection models are trained on both raw and obfuscated datasets to assess performance trade-offs.
  • Figure 3: Each row represents a different image processed through raw (left), anonymization (middle)hukkelaas2023deepprivacy2, and our approach (right). First two images are scrubbed with stable diffusion Rombach_2022_CVPR, and the last two are scrubbed using Kandinsky kandinsky2023.
  • Figure 4: Privacy Efficiency The bar plots show Person Removal Efficiency (PE%) and Image Removal Efficiency (IE%). KD (Kandinsky), SD (Stable Diffusion), and AOT (AOT-GAN) denote different inpainting methods, while Drop removes sensitive images.
  • Figure 5: Comparison of FP (scrubbing sensitive objects) and FP.drop (dropping images with sensitive objects) accuracies across different clusters. The improvement factor indicates how much better scrubbing performs compared to dropping on average, particularly highlighting the significant advantages for small and occlusion-prone objects.
  • ...and 12 more figures