Table of Contents
Fetching ...

SemDP: Semantic-level Differential Privacy Protection for Face Datasets

Xiaoting Zhang, Tao Wang, Junhao Ji

TL;DR

This work addresses privacy risks in publishing large face image datasets by arguing that pixel- or image-level differential privacy is insufficient to protect semantic information. It proposes a semantic-level DP pipeline that (i) builds a face attribute database from labeled attributes, (ii) perturbs those attributes using a randomized-response mechanism with a privacy budget $\varepsilon_w$ to satisfy $\varepsilon_w$-DP, and (iii) synthesizes a protected face image dataset via a GAN conditioned on the perturbed attributes. The approach demonstrates provable semantic privacy for attributes while preserving image realism, and shows improved privacy-utility trade-offs compared with state-of-the-art DP-based schemes on CelebA. The results suggest a practical path to safer sharing of face datasets, with future work extending to video and temporally consistent protection.

Abstract

While large-scale face datasets have advanced deep learning-based face analysis, they also raise privacy concerns due to the sensitive personal information they contain. Recent schemes have implemented differential privacy to protect face datasets. However, these schemes generally treat each image as a separate database, which does not fully meet the core requirements of differential privacy. In this paper, we propose a semantic-level differential privacy protection scheme that applies to the entire face dataset. Unlike pixel-level differential privacy approaches, our scheme guarantees that semantic privacy in faces is not compromised. The key idea is to convert unstructured data into structured data to enable the application of differential privacy. Specifically, we first extract semantic information from the face dataset to build an attribute database, then apply differential perturbations to obscure this attribute data, and finally use an image synthesis model to generate a protected face dataset. Extensive experimental results show that our scheme can maintain visual naturalness and balance the privacy-utility trade-off compared to the mainstream schemes.

SemDP: Semantic-level Differential Privacy Protection for Face Datasets

TL;DR

This work addresses privacy risks in publishing large face image datasets by arguing that pixel- or image-level differential privacy is insufficient to protect semantic information. It proposes a semantic-level DP pipeline that (i) builds a face attribute database from labeled attributes, (ii) perturbs those attributes using a randomized-response mechanism with a privacy budget to satisfy -DP, and (iii) synthesizes a protected face image dataset via a GAN conditioned on the perturbed attributes. The approach demonstrates provable semantic privacy for attributes while preserving image realism, and shows improved privacy-utility trade-offs compared with state-of-the-art DP-based schemes on CelebA. The results suggest a practical path to safer sharing of face datasets, with future work extending to video and temporally consistent protection.

Abstract

While large-scale face datasets have advanced deep learning-based face analysis, they also raise privacy concerns due to the sensitive personal information they contain. Recent schemes have implemented differential privacy to protect face datasets. However, these schemes generally treat each image as a separate database, which does not fully meet the core requirements of differential privacy. In this paper, we propose a semantic-level differential privacy protection scheme that applies to the entire face dataset. Unlike pixel-level differential privacy approaches, our scheme guarantees that semantic privacy in faces is not compromised. The key idea is to convert unstructured data into structured data to enable the application of differential privacy. Specifically, we first extract semantic information from the face dataset to build an attribute database, then apply differential perturbations to obscure this attribute data, and finally use an image synthesis model to generate a protected face dataset. Extensive experimental results show that our scheme can maintain visual naturalness and balance the privacy-utility trade-off compared to the mainstream schemes.

Paper Structure

This paper contains 16 sections, 18 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: The workflow of the proposed scheme. Stage-I: Constructing the face attribute database corresponding to the face image dataset. Stage-II: Generating the released face attribute database under the randomized response mechanism. Stage-III: Implementing the image synthesis technique to obtain the released face image dataset.
  • Figure 2: Visual results for a single attribute. Original faces are shown in the first column, while the rest columns display corresponding faces with a single perturbed attribute under different $\varepsilon$. From top to bottom, the attributes are "Bangs", "Blond Hair", "Male", "Pale Skin", and "Young", respectively.
  • Figure 3: Visual results for multiple attributes. Original faces are shown in the first column, while the rest columns display corresponding faces with multiple perturbed attributes under different $\varepsilon$.
  • Figure 4: Attribute privacy protection performance. The accuracies for every attribute are evaluated on both the original images and the synthesized results under varying ${\varepsilon}_{w}$ values.
  • Figure 5: Attribute utility performance. The accuracies for each attribute are assessed on both the original images and the synthesized outcomes with different ${\varepsilon}_{w}$ values.
  • ...and 1 more figures