Privacy Blur: Quantifying Privacy and Utility for Image Data Release
Saeed Mahloujifar, Narine Kokhlikyan, Chuan Guo, Kamalika Chaudhuri
TL;DR
The paper investigates how to privately release image data without sacrificing model training usefulness. It introduces a formal privacy-utility framework and compares four obfuscation methods, revealing that Gaussian blur is easily reversible and thus weak on privacy, while pixelization and DP-Pix can offer strong privacy with minor utility loss when tuned. Reversal and discrimination attacks are developed to quantify privacy, and extensive experiments across OCR-like datasets and vision tasks demonstrate that cropping is highly private but harms utility, whereas pixelization and DP-Pix strike a favorable balance for many tasks. The work culminates in a practical Privacy Blur software package to guide privacy-aware image data releases.
Abstract
Image data collected in the wild often contains private information such as faces and license plates, and responsible data release must ensure that this information stays hidden. At the same time, released data should retain its usefulness for model-training. The standard method for private information obfuscation in images is Gaussian blurring. In this work, we show that practical implementations of Gaussian blurring are reversible enough to break privacy. We then take a closer look at the privacy-utility tradeoffs offered by three other obfuscation algorithms -- pixelization, pixelization and noise addition (DP-Pix), and cropping. Privacy is evaluated by reversal and discrimination attacks, while utility by the quality of the learnt representations when the model is trained on data with obfuscated faces. We show that the most popular industry-standard method, Gaussian blur is the least private of the four -- being susceptible to reversal attacks in its practical low-precision implementations. In contrast, pixelization and pixelization plus noise addition, when used at the right level of granularity, offer both privacy and utility for a number of computer vision tasks. We make our proposed methods together with suggested parameters available in a software package called Privacy Blur.
