Disposable-key-based image encryption for collaborative learning of Vision Transformer
Rei Aso, Sayaka Shiota, Hitoshi Kiya
TL;DR
This work tackles privacy-preserving collaborative learning for Vision Transformer by using learnable encryption to encrypt training images with per-image keys. Training proceeds on encrypted data transmitted only once to a central server, reducing communication and client-side computation compared to traditional federated approaches. The method employs block scrambling and pixel permutation via random permutation matrices, with a novel use of restricted random permutation matrices to mitigate accuracy loss while preserving privacy. Evaluations on CIFAR-10 demonstrate that ViT can be fine-tuned with encrypted data, and that restricting permutation matrices improves accuracy and security trade-offs, highlighting practical impact for privacy-conscious multi-client learning.
Abstract
We propose a novel method for securely training the vision transformer (ViT) with sensitive data shared from multiple clients similar to privacy-preserving federated learning. In the proposed method, training images are independently encrypted by each client where encryption keys can be prepared by each client, and ViT is trained by using these encrypted images for the first time. The method allows clients not only to dispose of the keys but to also reduce the communication costs between a central server and the clients. In image classification experiments, we verify the effectiveness of the proposed method on the CIFAR-10 dataset in terms of classification accuracy and the use of restricted random permutation matrices.
