Privacy-Preserving Vision Transformer Using Images Encrypted with Restricted Random Permutation Matrices
Kouki Horio, Kiyoshi Nishikawa, Hitoshi Kiya
TL;DR
The paper addresses privacy-preserving fine-tuning of Vision Transformers in untrusted cloud environments where traditional encrypted-image approaches degrade performance. It introduces a method that encrypts images via block scrambling and pixel permutation using restricted random permutation matrices controlled by secret keys, enabling fine-tuning of a pre-trained ViT on encrypted data. Experiments on CIFAR-10 with a ViT backbone show that appropriately chosen encryption parameters can match or exceed the performance of non-encrypted baselines, while also improving training efficiency. This approach offers practical value for deploying privacy-preserving ViTs in cloud settings, since providers operate only on encrypted data and lack access to the secret keys or plain images.
Abstract
We propose a novel method for privacy-preserving fine-tuning vision transformers (ViTs) with encrypted images. Conventional methods using encrypted images degrade model performance compared with that of using plain images due to the influence of image encryption. In contrast, the proposed encryption method using restricted random permutation matrices can provide a higher performance than the conventional ones.
