Efficient Fine-Tuning with Domain Adaptation for Privacy-Preserving Vision Transformer

Teru Nagamori; Sayaka Shiota; Hitoshi Kiya

Efficient Fine-Tuning with Domain Adaptation for Privacy-Preserving Vision Transformer

Teru Nagamori, Sayaka Shiota, Hitoshi Kiya

TL;DR

This work tackles privacy-preserving image classification with Vision Transformers by allowing models to train and infer on encrypted images without substantial accuracy loss. It introduces a domain-adaptation-based fine-tuning method that aligns the ViT embedding process to encrypted inputs via key-dependent adaptations, enabling effective use of block-wise encryption and pixel shuffling. The method demonstrates near-baseline accuracy on CIFAR-10, CIFAR-100, and Imagenette datasets and improves training efficiency compared with encryption-alone training, outperforming existing privacy-preserving approaches. This supports practical deployment of ViT-based systems in untrusted cloud settings where data privacy is critical.

Abstract

We propose a novel method for privacy-preserving deep neural networks (DNNs) with the Vision Transformer (ViT). The method allows us not only to train models and test with visually protected images but to also avoid the performance degradation caused from the use of encrypted images, whereas conventional methods cannot avoid the influence of image encryption. A domain adaptation method is used to efficiently fine-tune ViT with encrypted images. In experiments, the method is demonstrated to outperform conventional methods in an image classification task on the CIFAR-10 and ImageNet datasets in terms of classification accuracy.

Efficient Fine-Tuning with Domain Adaptation for Privacy-Preserving Vision Transformer

TL;DR

Abstract

Paper Structure (14 sections, 7 equations, 6 figures, 3 tables)

This paper contains 14 sections, 7 equations, 6 figures, 3 tables.

Introduction
Related work
Image Encryption for Deep Learning
Vision Transformer
Proposed Method
Overview
Image Encryption
Fine-tuning with Domain Adaptation
Experiments
Setup
Classification Performance
Training Efficiency
Comparison with Conventional Methods
Conclusion

Figures (6)

Figure 1: Architecture of Vision Transformer ($N$=9)
Figure 2: Overview of proposed fine-tuning
Figure 3: Procedure of image encryption
Figure 4: Example of encrypted images
Figure 5: Test procedure
...and 1 more figures

Efficient Fine-Tuning with Domain Adaptation for Privacy-Preserving Vision Transformer

TL;DR

Abstract

Efficient Fine-Tuning with Domain Adaptation for Privacy-Preserving Vision Transformer

Authors

TL;DR

Abstract

Table of Contents

Figures (6)