Table of Contents
Fetching ...

Can Encrypted Images Still Train Neural Networks? Investigating Image Information and Random Vortex Transformation

XiaoKai Cao, WenJin Mo, ChangDong Wang, JianHuang Lai, Qiong Huang

TL;DR

This work defines a formal image information-content framework that combines pixel values with a nonlocal, distance-based neighboring-information term, enabling theoretical analysis of how transformations affect information. It proves that pixel swaps cannot increase total information content and introduces Random Vortex Transformation (RVT), an encryption that disturbs pixel coordinates along circumferences while preserving neighborhood relationships. The authors validate RVT experimentally, showing that neural models such as ResNet and Vision Transformer can be trained on RVT-encrypted data with minimal accuracy loss on MNIST and Fashion-MNIST and a modest loss on CIFAR-10, while random permutations severely degrade performance. The study demonstrates a path toward privacy-preserving machine learning on encrypted imagery and highlights a tangible link between information-content loss and accuracy degradation, with potential extensions to federated learning.

Abstract

Vision is one of the essential sources through which humans acquire information. In this paper, we establish a novel framework for measuring image information content to evaluate the variation in information content during image transformations. Within this framework, we design a nonlinear function to calculate the neighboring information content of pixels at different distances, and then use this information to measure the overall information content of the image. Hence, we define a function to represent the variation in information content during image transformations. Additionally, we utilize this framework to prove the conclusion that swapping the positions of any two pixels reduces the image's information content. Furthermore, based on the aforementioned framework, we propose a novel image encryption algorithm called Random Vortex Transformation. This algorithm encrypts the image using random functions while preserving the neighboring information of the pixels. The encrypted images are difficult for the human eye to distinguish, yet they allow for direct training of the encrypted images using machine learning methods. Experimental verification demonstrates that training on the encrypted dataset using ResNet and Vision Transformers only results in a decrease in accuracy ranging from 0.3\% to 6.5\% compared to the original data, while ensuring the security of the data. Furthermore, there is a positive correlation between the rate of information loss in the images and the rate of accuracy loss, further supporting the validity of the proposed image information content measurement framework.

Can Encrypted Images Still Train Neural Networks? Investigating Image Information and Random Vortex Transformation

TL;DR

This work defines a formal image information-content framework that combines pixel values with a nonlocal, distance-based neighboring-information term, enabling theoretical analysis of how transformations affect information. It proves that pixel swaps cannot increase total information content and introduces Random Vortex Transformation (RVT), an encryption that disturbs pixel coordinates along circumferences while preserving neighborhood relationships. The authors validate RVT experimentally, showing that neural models such as ResNet and Vision Transformer can be trained on RVT-encrypted data with minimal accuracy loss on MNIST and Fashion-MNIST and a modest loss on CIFAR-10, while random permutations severely degrade performance. The study demonstrates a path toward privacy-preserving machine learning on encrypted imagery and highlights a tangible link between information-content loss and accuracy degradation, with potential extensions to federated learning.

Abstract

Vision is one of the essential sources through which humans acquire information. In this paper, we establish a novel framework for measuring image information content to evaluate the variation in information content during image transformations. Within this framework, we design a nonlinear function to calculate the neighboring information content of pixels at different distances, and then use this information to measure the overall information content of the image. Hence, we define a function to represent the variation in information content during image transformations. Additionally, we utilize this framework to prove the conclusion that swapping the positions of any two pixels reduces the image's information content. Furthermore, based on the aforementioned framework, we propose a novel image encryption algorithm called Random Vortex Transformation. This algorithm encrypts the image using random functions while preserving the neighboring information of the pixels. The encrypted images are difficult for the human eye to distinguish, yet they allow for direct training of the encrypted images using machine learning methods. Experimental verification demonstrates that training on the encrypted dataset using ResNet and Vision Transformers only results in a decrease in accuracy ranging from 0.3\% to 6.5\% compared to the original data, while ensuring the security of the data. Furthermore, there is a positive correlation between the rate of information loss in the images and the rate of accuracy loss, further supporting the validity of the proposed image information content measurement framework.

Paper Structure

This paper contains 17 sections, 1 theorem, 19 equations, 13 figures, 2 tables.

Key Result

Theorem 3.1

Under the conditions stated in Section subsec: Definition, when swapping the positions of any two points in the image, the total information content of the image either decreases or remains unchanged.

Figures (13)

  • Figure 1: Researching and analyzing the application scenarios of image information.
  • Figure 2: Cartesian coordinate system.
  • Figure 3: The graph of the function $m_{neig}(\cdot,\cdot)$.
  • Figure 4: The neighboring information of each pixel is not solely determined by the surrounding eight pixels, but rather by the contribution of pixels at varying distances, which provide different amounts of neighboring information. As shown in the figure, in order to recognize the eyes, the red region provides the highest amount of information, while the green region utilizes facial information to provide a less amount of information. On the other hand, the background region (blue arrow) contributes almost negligible information.
  • Figure 5: Design objectives of the encryption scheme.
  • ...and 8 more figures

Theorems & Definitions (2)

  • Theorem 3.1
  • proof