DCT-CryptoNets: Scaling Private Inference in the Frequency Domain
Arjun Roy, Kaushik Roy
TL;DR
DCT-CryptoNets addresses private inference by moving CNN computations to the frequency domain using the discrete cosine transform (DCT), which reduces the cost of non-linear activations and homomorphic bootstrap operations in TFHE. It combines architectural changes in the DCT domain with quantization-aware training and programmable bootstrapping to enable deeper encrypted networks on large datasets. The method reports up to 5.3$ imes$ latency reduction relative to prior FHENN work and enables ImageNet inference in about 2.5 hours on 96-thread resources, with accuracy comparable to RGB baselines and improved reliability due to fewer bootstrap steps. Collectively, the work demonstrates that frequency-domain private inference can scale to high-resolution images with practical latency, offering a viable path toward real-world privacy-preserving DL.
Abstract
The convergence of fully homomorphic encryption (FHE) and machine learning offers unprecedented opportunities for private inference of sensitive data. FHE enables computation directly on encrypted data, safeguarding the entire machine learning pipeline, including data and model confidentiality. However, existing FHE-based implementations for deep neural networks face significant challenges in computational cost, latency, and scalability, limiting their practical deployment. This paper introduces DCT-CryptoNets, a novel approach that operates directly in the frequency-domain to reduce the burden of computationally expensive non-linear activations and homomorphic bootstrap operations during private inference. It does so by utilizing the discrete cosine transform (DCT), commonly employed in JPEG encoding, which has inherent compatibility with remote computing services where images are generally stored and transmitted in this encoded format. DCT-CryptoNets demonstrates a substantial latency reductions of up to 5.3$\times$ compared to prior work on benchmark image classification tasks. Notably, it demonstrates inference on the ImageNet dataset within 2.5 hours (down from 12.5 hours on equivalent 96-thread compute resources). Furthermore, by learning perceptually salient low-frequency information DCT-CryptoNets improves the reliability of encrypted predictions compared to RGB-based networks by reducing error accumulating homomorphic bootstrap operations. DCT-CryptoNets also demonstrates superior scalability to RGB-based networks by further reducing computational cost as image size increases. This study demonstrates a promising avenue for achieving efficient and practical private inference of deep learning models on high resolution images seen in real-world applications.
