A Lightweight Privacy-Preserving Scheme Using Label-based Pixel Block Mixing for Image Classification in Deep Learning
Yuexin Xiang, Tiantian Li, Wei Ren, Tianqing Zhu, Kim-Kwang Raymond Choo
TL;DR
This work tackles privacy leakage in image data used for training deep learning models by introducing a lightweight pixel block mixing scheme that preserves training utility while obscuring sensitive personal features. The method partitions images into blocks and performs probabilistic cross-image substitutions among same-label samples, governed by parameters such as $N_t$, $N_s$, $L_b$, and $W_b$, to produce a mixed dataset $I_T$. Experimental evaluations on the WIKI and CNBC face datasets show that the approach maintains competitive accuracy for several CNNs (e.g., ResNet50, VGG16, InceptionV3, DenseNet121) while reducing similarity to the original images as $N_b$ increases, with data augmentation further boosting performance. The results demonstrate a practical privacy-preserving option that can be tuned for different privacy-utility requirements and computational budgets, enabling wider deployment in privacy-sensitive DL training scenarios.
Abstract
To ensure the privacy of sensitive data used in the training of deep learning models, a number of privacy-preserving methods have been designed by the research community. However, existing schemes are generally designed to work with textual data, or are not efficient when a large number of images is used for training. Hence, in this paper we propose a lightweight and efficient approach to preserve image privacy while maintaining the availability of the training set. Specifically, we design the pixel block mixing algorithm for image classification privacy preservation in deep learning. To evaluate its utility, we use the mixed training set to train the ResNet50, VGG16, InceptionV3 and DenseNet121 models on the WIKI dataset and the CNBC face dataset. Experimental findings on the testing set show that our scheme preserves image privacy while maintaining the availability of the training set in the deep learning models. Additionally, the experimental results demonstrate that we achieve good performance for the VGG16 model on the WIKI dataset and both ResNet50 and DenseNet121 on the CNBC dataset. The pixel block algorithm achieves fairly high efficiency in the mixing of the images, and it is computationally challenging for the attackers to restore the mixed training set to the original training set. Moreover, data augmentation can be applied to the mixed training set to improve the training's effectiveness.
