SemDP: Semantic-level Differential Privacy Protection for Face Datasets
Xiaoting Zhang, Tao Wang, Junhao Ji
TL;DR
This work addresses privacy risks in publishing large face image datasets by arguing that pixel- or image-level differential privacy is insufficient to protect semantic information. It proposes a semantic-level DP pipeline that (i) builds a face attribute database from labeled attributes, (ii) perturbs those attributes using a randomized-response mechanism with a privacy budget $\varepsilon_w$ to satisfy $\varepsilon_w$-DP, and (iii) synthesizes a protected face image dataset via a GAN conditioned on the perturbed attributes. The approach demonstrates provable semantic privacy for attributes while preserving image realism, and shows improved privacy-utility trade-offs compared with state-of-the-art DP-based schemes on CelebA. The results suggest a practical path to safer sharing of face datasets, with future work extending to video and temporally consistent protection.
Abstract
While large-scale face datasets have advanced deep learning-based face analysis, they also raise privacy concerns due to the sensitive personal information they contain. Recent schemes have implemented differential privacy to protect face datasets. However, these schemes generally treat each image as a separate database, which does not fully meet the core requirements of differential privacy. In this paper, we propose a semantic-level differential privacy protection scheme that applies to the entire face dataset. Unlike pixel-level differential privacy approaches, our scheme guarantees that semantic privacy in faces is not compromised. The key idea is to convert unstructured data into structured data to enable the application of differential privacy. Specifically, we first extract semantic information from the face dataset to build an attribute database, then apply differential perturbations to obscure this attribute data, and finally use an image synthesis model to generate a protected face dataset. Extensive experimental results show that our scheme can maintain visual naturalness and balance the privacy-utility trade-off compared to the mainstream schemes.
