Can Encrypted Images Still Train Neural Networks? Investigating Image Information and Random Vortex Transformation
XiaoKai Cao, WenJin Mo, ChangDong Wang, JianHuang Lai, Qiong Huang
TL;DR
This work defines a formal image information-content framework that combines pixel values with a nonlocal, distance-based neighboring-information term, enabling theoretical analysis of how transformations affect information. It proves that pixel swaps cannot increase total information content and introduces Random Vortex Transformation (RVT), an encryption that disturbs pixel coordinates along circumferences while preserving neighborhood relationships. The authors validate RVT experimentally, showing that neural models such as ResNet and Vision Transformer can be trained on RVT-encrypted data with minimal accuracy loss on MNIST and Fashion-MNIST and a modest loss on CIFAR-10, while random permutations severely degrade performance. The study demonstrates a path toward privacy-preserving machine learning on encrypted imagery and highlights a tangible link between information-content loss and accuracy degradation, with potential extensions to federated learning.
Abstract
Vision is one of the essential sources through which humans acquire information. In this paper, we establish a novel framework for measuring image information content to evaluate the variation in information content during image transformations. Within this framework, we design a nonlinear function to calculate the neighboring information content of pixels at different distances, and then use this information to measure the overall information content of the image. Hence, we define a function to represent the variation in information content during image transformations. Additionally, we utilize this framework to prove the conclusion that swapping the positions of any two pixels reduces the image's information content. Furthermore, based on the aforementioned framework, we propose a novel image encryption algorithm called Random Vortex Transformation. This algorithm encrypts the image using random functions while preserving the neighboring information of the pixels. The encrypted images are difficult for the human eye to distinguish, yet they allow for direct training of the encrypted images using machine learning methods. Experimental verification demonstrates that training on the encrypted dataset using ResNet and Vision Transformers only results in a decrease in accuracy ranging from 0.3\% to 6.5\% compared to the original data, while ensuring the security of the data. Furthermore, there is a positive correlation between the rate of information loss in the images and the rate of accuracy loss, further supporting the validity of the proposed image information content measurement framework.
