Privacy-Preserving Vision Transformer Using Images Encrypted with Restricted Random Permutation Matrices

Kouki Horio; Kiyoshi Nishikawa; Hitoshi Kiya

Privacy-Preserving Vision Transformer Using Images Encrypted with Restricted Random Permutation Matrices

Kouki Horio, Kiyoshi Nishikawa, Hitoshi Kiya

TL;DR

The paper addresses privacy-preserving fine-tuning of Vision Transformers in untrusted cloud environments where traditional encrypted-image approaches degrade performance. It introduces a method that encrypts images via block scrambling and pixel permutation using restricted random permutation matrices controlled by secret keys, enabling fine-tuning of a pre-trained ViT on encrypted data. Experiments on CIFAR-10 with a ViT backbone show that appropriately chosen encryption parameters can match or exceed the performance of non-encrypted baselines, while also improving training efficiency. This approach offers practical value for deploying privacy-preserving ViTs in cloud settings, since providers operate only on encrypted data and lack access to the secret keys or plain images.

Abstract

We propose a novel method for privacy-preserving fine-tuning vision transformers (ViTs) with encrypted images. Conventional methods using encrypted images degrade model performance compared with that of using plain images due to the influence of image encryption. In contrast, the proposed encryption method using restricted random permutation matrices can provide a higher performance than the conventional ones.

Privacy-Preserving Vision Transformer Using Images Encrypted with Restricted Random Permutation Matrices

TL;DR

Abstract

Paper Structure (9 sections, 2 equations, 8 figures, 1 table)

This paper contains 9 sections, 2 equations, 8 figures, 1 table.

Introduction
Proposed method
Overview of proposed method
Image encryption
Example of restricted random permutation matrices
Experiments
Classification accuracy
Training efficiency
Conclusion

Figures (8)

Figure 1: Framework of privacy-preserving ViT
Figure 2: Plain
Figure 3: $N_{bs}=0, N_{ps}=0$
Figure 4: $N_{bs}=0, N_{ps}=768$
Figure 5: $N_{bs}=196, N_{ps}=0$
...and 3 more figures

Privacy-Preserving Vision Transformer Using Images Encrypted with Restricted Random Permutation Matrices

TL;DR

Abstract

Privacy-Preserving Vision Transformer Using Images Encrypted with Restricted Random Permutation Matrices

Authors

TL;DR

Abstract

Table of Contents

Figures (8)