Table of Contents
Fetching ...

Hawk: Accurate and Fast Privacy-Preserving Machine Learning Using Secure Lookup Table Computation

Hamza Saleem, Amir Ziashahabi, Muhammad Naveed, Salman Avestimehr

TL;DR

This work tackles privacy-preserving machine learning by designing two two-server protocols, Hawk_Single and Hawk_Multi, that compute nonlinear activations through secret-shared lookup tables rather than Yao garbled circuits. Hawk_Single offers leakage-free, plaintext-accurate activation evaluation, while Hawk_Multi permits reusable lookup tables under a controlled $\epsilon$-$d_{\mathcal{X}}$-privacy leakage, reducing computation and storage costs. The authors provide detailed constructions for univariate activations (and Softmax) and demonstrate substantial speedups over prior systems (up to $9\times$ for logistic regression and up to $688\times$ for neural networks) with accuracy matching plaintext training (e.g., MNIST 96.6% in 15 epochs). They also analyze the leakage in the relaxed setting and show how differential privacy notions can bound information revealed by access patterns. Overall, Hawk advances practical PPML by enabling accurate standard activations with lookup-tables in a two-server MPC setting, significantly reducing offline/online costs and expanding the range of feasible privacy-preserving collaborative learning scenarios, with strong empirical performance on multiple datasets.

Abstract

Training machine learning models on data from multiple entities without direct data sharing can unlock applications otherwise hindered by business, legal, or ethical constraints. In this work, we design and implement new privacy-preserving machine learning protocols for logistic regression and neural network models. We adopt a two-server model where data owners secret-share their data between two servers that train and evaluate the model on the joint data. A significant source of inefficiency and inaccuracy in existing methods arises from using Yao's garbled circuits to compute non-linear activation functions. We propose new methods for computing non-linear functions based on secret-shared lookup tables, offering both computational efficiency and improved accuracy. Beyond introducing leakage-free techniques, we initiate the exploration of relaxed security measures for privacy-preserving machine learning. Instead of claiming that the servers gain no knowledge during the computation, we contend that while some information is revealed about access patterns to lookup tables, it maintains epsilon-dX-privacy. Leveraging this relaxation significantly reduces the computational resources needed for training. We present new cryptographic protocols tailored to this relaxed security paradigm and define and analyze the leakage. Our evaluations show that our logistic regression protocol is up to 9x faster, and the neural network training is up to 688x faster than SecureML. Notably, our neural network achieves an accuracy of 96.6% on MNIST in 15 epochs, outperforming prior benchmarks that capped at 93.4% using the same architecture.

Hawk: Accurate and Fast Privacy-Preserving Machine Learning Using Secure Lookup Table Computation

TL;DR

This work tackles privacy-preserving machine learning by designing two two-server protocols, Hawk_Single and Hawk_Multi, that compute nonlinear activations through secret-shared lookup tables rather than Yao garbled circuits. Hawk_Single offers leakage-free, plaintext-accurate activation evaluation, while Hawk_Multi permits reusable lookup tables under a controlled --privacy leakage, reducing computation and storage costs. The authors provide detailed constructions for univariate activations (and Softmax) and demonstrate substantial speedups over prior systems (up to for logistic regression and up to for neural networks) with accuracy matching plaintext training (e.g., MNIST 96.6% in 15 epochs). They also analyze the leakage in the relaxed setting and show how differential privacy notions can bound information revealed by access patterns. Overall, Hawk advances practical PPML by enabling accurate standard activations with lookup-tables in a two-server MPC setting, significantly reducing offline/online costs and expanding the range of feasible privacy-preserving collaborative learning scenarios, with strong empirical performance on multiple datasets.

Abstract

Training machine learning models on data from multiple entities without direct data sharing can unlock applications otherwise hindered by business, legal, or ethical constraints. In this work, we design and implement new privacy-preserving machine learning protocols for logistic regression and neural network models. We adopt a two-server model where data owners secret-share their data between two servers that train and evaluate the model on the joint data. A significant source of inefficiency and inaccuracy in existing methods arises from using Yao's garbled circuits to compute non-linear activation functions. We propose new methods for computing non-linear functions based on secret-shared lookup tables, offering both computational efficiency and improved accuracy. Beyond introducing leakage-free techniques, we initiate the exploration of relaxed security measures for privacy-preserving machine learning. Instead of claiming that the servers gain no knowledge during the computation, we contend that while some information is revealed about access patterns to lookup tables, it maintains epsilon-dX-privacy. Leveraging this relaxation significantly reduces the computational resources needed for training. We present new cryptographic protocols tailored to this relaxed security paradigm and define and analyze the leakage. Our evaluations show that our logistic regression protocol is up to 9x faster, and the neural network training is up to 688x faster than SecureML. Notably, our neural network achieves an accuracy of 96.6% on MNIST in 15 epochs, outperforming prior benchmarks that capped at 93.4% using the same architecture.
Paper Structure (15 sections, 1 theorem, 5 figures, 1 table, 9 algorithms)

This paper contains 15 sections, 1 theorem, 5 figures, 1 table, 9 algorithms.

Key Result

Theorem 1

Protocol $\Pi_{\mathsf{Hawk}_{\mathsf{Single}}\xspace}^\mathsf{online}\xspace$ in Algorithm algorithm:lookup-single-online securely realizes the functionality $\mathcal{F}_{\mathsf{SingleLookup}\xspace}$ against semi-honest adversaries.

Figures (5)

  • Figure 1: Overview of our PPML setup.
  • Figure 2: a) MNIST ($\textbf{|B| = 128}$) b) Arcene ($\textbf{|B| = 100}$) Accuracy comparison of plaintext training with our PPLR protocol considering different bit representations for the Sigmoid function.
  • Figure 3: Ideal Functionality $\mathcal{F}_{\mathsf{SingleLookup}\xspace}$
  • Figure 4: Ideal Functionality $\mathcal{F}_{\mathsf{SC}}$
  • Figure 5: Ideal Functionality $\mathcal{F}_{\mathsf{MultiLookup}\xspace}$

Theorems & Definitions (7)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6
  • Theorem 1