Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware
Florian Tramèr, Dan Boneh
TL;DR
The paper tackles the efficiency gap in secure ML outsourcing by splitting DNN inference: keep nonlinear, control-sensitive components inside a trusted enclave while outsourcing the heavy linear layers to a fast, untrusted co-processor. It introduces Slalom, which uses quantization, Freivalds' algorithm for integrity, and a lightweight input-blinding scheme to achieve verifiable and private inference with substantial throughput gains on canonical models (e.g., VGG16, MobileNet, ResNet). The authors provide formal security arguments, implement a SGX-based DNN library, and demonstrate 6×–20× improvements for verifiable inference and 4×–11× for verifiable and private inference compared to running entirely in the TEE. They also discuss the challenges and directions for verifiable/private training, and show that Slalom scales favorably with model size, paving the way for practical, secure ML in TEEs and co-located accelerators.
Abstract
As Machine Learning (ML) gets applied to security-critical or sensitive domains, there is a growing need for integrity and privacy for outsourced ML computations. A pragmatic solution comes from Trusted Execution Environments (TEEs), which use hardware and software protections to isolate sensitive computations from the untrusted software stack. However, these isolation guarantees come at a price in performance, compared to untrusted alternatives. This paper initiates the study of high performance execution of Deep Neural Networks (DNNs) in TEEs by efficiently partitioning DNN computations between trusted and untrusted devices. Building upon an efficient outsourcing scheme for matrix multiplication, we propose Slalom, a framework that securely delegates execution of all linear layers in a DNN from a TEE (e.g., Intel SGX or Sanctum) to a faster, yet untrusted, co-located processor. We evaluate Slalom by running DNNs in an Intel SGX enclave, which selectively delegates work to an untrusted GPU. For canonical DNNs (VGG16, MobileNet and ResNet variants) we obtain 6x to 20x increases in throughput for verifiable inference, and 4x to 11x for verifiable and private inference.
