Hawk: Accurate and Fast Privacy-Preserving Machine Learning Using Secure Lookup Table Computation
Hamza Saleem, Amir Ziashahabi, Muhammad Naveed, Salman Avestimehr
TL;DR
This work tackles privacy-preserving machine learning by designing two two-server protocols, Hawk_Single and Hawk_Multi, that compute nonlinear activations through secret-shared lookup tables rather than Yao garbled circuits. Hawk_Single offers leakage-free, plaintext-accurate activation evaluation, while Hawk_Multi permits reusable lookup tables under a controlled $\epsilon$-$d_{\mathcal{X}}$-privacy leakage, reducing computation and storage costs. The authors provide detailed constructions for univariate activations (and Softmax) and demonstrate substantial speedups over prior systems (up to $9\times$ for logistic regression and up to $688\times$ for neural networks) with accuracy matching plaintext training (e.g., MNIST 96.6% in 15 epochs). They also analyze the leakage in the relaxed setting and show how differential privacy notions can bound information revealed by access patterns. Overall, Hawk advances practical PPML by enabling accurate standard activations with lookup-tables in a two-server MPC setting, significantly reducing offline/online costs and expanding the range of feasible privacy-preserving collaborative learning scenarios, with strong empirical performance on multiple datasets.
Abstract
Training machine learning models on data from multiple entities without direct data sharing can unlock applications otherwise hindered by business, legal, or ethical constraints. In this work, we design and implement new privacy-preserving machine learning protocols for logistic regression and neural network models. We adopt a two-server model where data owners secret-share their data between two servers that train and evaluate the model on the joint data. A significant source of inefficiency and inaccuracy in existing methods arises from using Yao's garbled circuits to compute non-linear activation functions. We propose new methods for computing non-linear functions based on secret-shared lookup tables, offering both computational efficiency and improved accuracy. Beyond introducing leakage-free techniques, we initiate the exploration of relaxed security measures for privacy-preserving machine learning. Instead of claiming that the servers gain no knowledge during the computation, we contend that while some information is revealed about access patterns to lookup tables, it maintains epsilon-dX-privacy. Leveraging this relaxation significantly reduces the computational resources needed for training. We present new cryptographic protocols tailored to this relaxed security paradigm and define and analyze the leakage. Our evaluations show that our logistic regression protocol is up to 9x faster, and the neural network training is up to 688x faster than SecureML. Notably, our neural network achieves an accuracy of 96.6% on MNIST in 15 epochs, outperforming prior benchmarks that capped at 93.4% using the same architecture.
