Table of Contents
Fetching ...

DCT-CryptoNets: Scaling Private Inference in the Frequency Domain

Arjun Roy, Kaushik Roy

TL;DR

DCT-CryptoNets addresses private inference by moving CNN computations to the frequency domain using the discrete cosine transform (DCT), which reduces the cost of non-linear activations and homomorphic bootstrap operations in TFHE. It combines architectural changes in the DCT domain with quantization-aware training and programmable bootstrapping to enable deeper encrypted networks on large datasets. The method reports up to 5.3$ imes$ latency reduction relative to prior FHENN work and enables ImageNet inference in about 2.5 hours on 96-thread resources, with accuracy comparable to RGB baselines and improved reliability due to fewer bootstrap steps. Collectively, the work demonstrates that frequency-domain private inference can scale to high-resolution images with practical latency, offering a viable path toward real-world privacy-preserving DL.

Abstract

The convergence of fully homomorphic encryption (FHE) and machine learning offers unprecedented opportunities for private inference of sensitive data. FHE enables computation directly on encrypted data, safeguarding the entire machine learning pipeline, including data and model confidentiality. However, existing FHE-based implementations for deep neural networks face significant challenges in computational cost, latency, and scalability, limiting their practical deployment. This paper introduces DCT-CryptoNets, a novel approach that operates directly in the frequency-domain to reduce the burden of computationally expensive non-linear activations and homomorphic bootstrap operations during private inference. It does so by utilizing the discrete cosine transform (DCT), commonly employed in JPEG encoding, which has inherent compatibility with remote computing services where images are generally stored and transmitted in this encoded format. DCT-CryptoNets demonstrates a substantial latency reductions of up to 5.3$\times$ compared to prior work on benchmark image classification tasks. Notably, it demonstrates inference on the ImageNet dataset within 2.5 hours (down from 12.5 hours on equivalent 96-thread compute resources). Furthermore, by learning perceptually salient low-frequency information DCT-CryptoNets improves the reliability of encrypted predictions compared to RGB-based networks by reducing error accumulating homomorphic bootstrap operations. DCT-CryptoNets also demonstrates superior scalability to RGB-based networks by further reducing computational cost as image size increases. This study demonstrates a promising avenue for achieving efficient and practical private inference of deep learning models on high resolution images seen in real-world applications.

DCT-CryptoNets: Scaling Private Inference in the Frequency Domain

TL;DR

DCT-CryptoNets addresses private inference by moving CNN computations to the frequency domain using the discrete cosine transform (DCT), which reduces the cost of non-linear activations and homomorphic bootstrap operations in TFHE. It combines architectural changes in the DCT domain with quantization-aware training and programmable bootstrapping to enable deeper encrypted networks on large datasets. The method reports up to 5.3 latency reduction relative to prior FHENN work and enables ImageNet inference in about 2.5 hours on 96-thread resources, with accuracy comparable to RGB baselines and improved reliability due to fewer bootstrap steps. Collectively, the work demonstrates that frequency-domain private inference can scale to high-resolution images with practical latency, offering a viable path toward real-world privacy-preserving DL.

Abstract

The convergence of fully homomorphic encryption (FHE) and machine learning offers unprecedented opportunities for private inference of sensitive data. FHE enables computation directly on encrypted data, safeguarding the entire machine learning pipeline, including data and model confidentiality. However, existing FHE-based implementations for deep neural networks face significant challenges in computational cost, latency, and scalability, limiting their practical deployment. This paper introduces DCT-CryptoNets, a novel approach that operates directly in the frequency-domain to reduce the burden of computationally expensive non-linear activations and homomorphic bootstrap operations during private inference. It does so by utilizing the discrete cosine transform (DCT), commonly employed in JPEG encoding, which has inherent compatibility with remote computing services where images are generally stored and transmitted in this encoded format. DCT-CryptoNets demonstrates a substantial latency reductions of up to 5.3 compared to prior work on benchmark image classification tasks. Notably, it demonstrates inference on the ImageNet dataset within 2.5 hours (down from 12.5 hours on equivalent 96-thread compute resources). Furthermore, by learning perceptually salient low-frequency information DCT-CryptoNets improves the reliability of encrypted predictions compared to RGB-based networks by reducing error accumulating homomorphic bootstrap operations. DCT-CryptoNets also demonstrates superior scalability to RGB-based networks by further reducing computational cost as image size increases. This study demonstrates a promising avenue for achieving efficient and practical private inference of deep learning models on high resolution images seen in real-world applications.
Paper Structure (25 sections, 2 equations, 4 figures, 9 tables)

This paper contains 25 sections, 2 equations, 4 figures, 9 tables.

Figures (4)

  • Figure 1: Scalability of state of the art image classification methods in FHENN. DCT-CryptoNets is able to reduce latency by up to 5.3$\times$ compared to SHE she, the only other published method able to infer on ImageNet. To ensure a fair comparison, SHE latency values were normalized to the same computational resources as DCT-CryptoNets (96-threads). CKKS-based methods have difficulty scaling to larger networks and datasets due to their highly approximate nature.
  • Figure 2: DCT-CryptoNets' frequency encoding (based on dct1) and ResNet-18 network architecture. Modifications from an RGB-based network to DCT-CryptoNets are emphasized in bold and darker purple. These include kernel size and downsampling of the first convolution layer, as well as exclusion of both the ReLU operator and pooling layer after the first convolutional layer. This approach requires minimal modification of existing networks to utilize DCT, making conversion for many potential applications simple.
  • Figure 3: MLaaS system with DCT-CryptoNets.
  • Figure 4: Traditional vs. DCT-CryptoNets ResNet-20 Architecture. Changes between the two architectures are bolded.