Table of Contents
Fetching ...

Privacy-Preserving Federated Vision Transformer Learning Leveraging Lightweight Homomorphic Encryption in Medical AI

Al Amin, Kamrul Hasan, Liang Hong, Sharif Ullah

TL;DR

This work tackles privacy in collaborative medical-imaging AI by integrating Vision Transformers with CKKS homomorphic encryption, targeting secure aggregation of compact $D=768$-dimensional CLS tokens rather than raw data or full gradients. Encrypting CLS tokens reduces per-sample communication by about $30\times$ to roughly $326$ KB and protects against gradient-based reconstruction attacks, which can yield near-perfect image recovery with metrics like PSNR $=52.26$ dB and SSIM $=0.999$. The framework enables server-side encrypted inference with a polynomial activation, achieving $90.02\%$ accuracy in the encrypted domain (vs $96.12\%$ unencrypted) while substantially lowering communication and preserving privacy. Demonstrations on a three-client histopathology dataset show practical training and inference performance, with encrypted inference completing around $66$ ms per image on CPU and requiring modest memory, suggesting feasible deployment in bandwidth-limited clinical networks. Overall, the approach delivers a compelling balance between privacy guarantees, communication efficiency, and diagnostic accuracy for multi-institution medical AI.

Abstract

Collaborative machine learning across healthcare institutions promises improved diagnostic accuracy by leveraging diverse datasets, yet privacy regulations such as HIPAA prohibit direct patient data sharing. While federated learning (FL) enables decentralized training without raw data exchange, recent studies show that model gradients in conventional FL remain vulnerable to reconstruction attacks, potentially exposing sensitive medical information. This paper presents a privacy-preserving federated learning framework combining Vision Transformers (ViT) with homomorphic encryption (HE) for secure multi-institutional histopathology classification. The approach leverages the ViT CLS token as a compact 768-dimensional feature representation for secure aggregation, encrypting these tokens using CKKS homomorphic encryption before transmission to the server. We demonstrate that encrypting CLS tokens achieves a 30-fold communication reduction compared to gradient encryption while maintaining strong privacy guarantees. Through evaluation on a three-client federated setup for lung cancer histopathology classification, we show that gradients are highly susceptible to model inversion attacks (PSNR: 52.26 dB, SSIM: 0.999, NMI: 0.741), enabling near-perfect image reconstruction. In contrast, the proposed CLS-protected HE approach prevents such attacks while enabling encrypted inference directly on ciphertexts, requiring only 326 KB of encrypted data transmission per aggregation round. The framework achieves 96.12 percent global classification accuracy in the unencrypted domain and 90.02 percent in the encrypted domain.

Privacy-Preserving Federated Vision Transformer Learning Leveraging Lightweight Homomorphic Encryption in Medical AI

TL;DR

This work tackles privacy in collaborative medical-imaging AI by integrating Vision Transformers with CKKS homomorphic encryption, targeting secure aggregation of compact -dimensional CLS tokens rather than raw data or full gradients. Encrypting CLS tokens reduces per-sample communication by about to roughly KB and protects against gradient-based reconstruction attacks, which can yield near-perfect image recovery with metrics like PSNR dB and SSIM . The framework enables server-side encrypted inference with a polynomial activation, achieving accuracy in the encrypted domain (vs unencrypted) while substantially lowering communication and preserving privacy. Demonstrations on a three-client histopathology dataset show practical training and inference performance, with encrypted inference completing around ms per image on CPU and requiring modest memory, suggesting feasible deployment in bandwidth-limited clinical networks. Overall, the approach delivers a compelling balance between privacy guarantees, communication efficiency, and diagnostic accuracy for multi-institution medical AI.

Abstract

Collaborative machine learning across healthcare institutions promises improved diagnostic accuracy by leveraging diverse datasets, yet privacy regulations such as HIPAA prohibit direct patient data sharing. While federated learning (FL) enables decentralized training without raw data exchange, recent studies show that model gradients in conventional FL remain vulnerable to reconstruction attacks, potentially exposing sensitive medical information. This paper presents a privacy-preserving federated learning framework combining Vision Transformers (ViT) with homomorphic encryption (HE) for secure multi-institutional histopathology classification. The approach leverages the ViT CLS token as a compact 768-dimensional feature representation for secure aggregation, encrypting these tokens using CKKS homomorphic encryption before transmission to the server. We demonstrate that encrypting CLS tokens achieves a 30-fold communication reduction compared to gradient encryption while maintaining strong privacy guarantees. Through evaluation on a three-client federated setup for lung cancer histopathology classification, we show that gradients are highly susceptible to model inversion attacks (PSNR: 52.26 dB, SSIM: 0.999, NMI: 0.741), enabling near-perfect image reconstruction. In contrast, the proposed CLS-protected HE approach prevents such attacks while enabling encrypted inference directly on ciphertexts, requiring only 326 KB of encrypted data transmission per aggregation round. The framework achieves 96.12 percent global classification accuracy in the unencrypted domain and 90.02 percent in the encrypted domain.

Paper Structure

This paper contains 23 sections, 19 equations, 4 figures, 3 tables, 2 algorithms.

Figures (4)

  • Figure 1: Privacy-preserving FL via encrypted [CLS] tokens: clients extract 768-D [CLS] from ViT, encrypt with CKKS, the server aggregates across $N$ clients and runs encrypted inference.
  • Figure 2: Client training accuracy over 30 epochs; final accuracies: 94.64%, 94.14%, and 91.52%.
  • Figure 3: Model inversion on gradient: high PSNR/SSIM/NMI (avg. 52.26 dB/0.999/0.741) indicates near-perfect reconstructions and severe privacy leakage.
  • Figure 4: Global performance (accuracy, F1, precision, recall) across four configurations. Encrypted [CLS] attains 90.02% accuracy, +4.67 pp vs. encrypted gradients, with 30$\times$ lower communication.