Data-Oblivious ML Accelerators using Hardware Security Extensions

Hossam ElAtali; John Z. Jekel; Lachlan J. Gunn; N. Asokan

Data-Oblivious ML Accelerators using Hardware Security Extensions

Hossam ElAtali, John Z. Jekel, Lachlan J. Gunn, N. Asokan

Abstract

Outsourced computation can put client data confidentiality at risk. Existing solutions are either inefficient or insufficiently secure: cryptographic techniques like fully-homomorphic encryption incur significant overheads, even with hardware assistance, while the complexity of hardware-assisted trusted execution environments has been exploited to leak secret data. Recent proposals such as BliMe and OISA show how dynamic information flow tracking (DIFT) enforced in hardware can protect client data efficiently. They are designed to protect CPU-only workloads. However, many outsourced computing applications, like machine learning, make extensive use of accelerators. We address this gap with Dolma, which applies DIFT to the Gemmini matrix multiplication accelerator, efficiently guaranteeing client data confidentiality, even in the presence of malicious/vulnerable software and side channel attacks on the server. We show that accelerators can allow DIFT logic optimizations that significantly reduce area overhead compared with general-purpose processor architectures. Dolma is integrated with the BliMe framework to achieve end-to-end security guarantees. We evaluate Dolma on an FPGA using a ResNet-50 DNN model and show that it incurs low overheads for large configurations ($4.4\%$, $16.7\%$, $16.5\%$ for performance, resource usage and power, respectively, with a 32x32 configuration).

Data-Oblivious ML Accelerators using Hardware Security Extensions

Abstract

for performance, resource usage and power, respectively, with a 32x32 configuration).

Paper Structure (26 sections, 3 equations, 5 figures, 1 table)

This paper contains 26 sections, 3 equations, 5 figures, 1 table.

Introduction
Background
Side channels
BliMe
Gemmini
Assumptions & Threat Model
Design & Implementation
Overview
Tag bits
RoCC commands
DIFT in the systolic array
Scratchpads & context switches
Tag mixing
Read-Check-Write
Activation functions
...and 11 more sections

Figures (5)

Figure 1: System Overview. The client encrypts and send their secret data to the untrusted software on the server ①, which calls BliMe's data import operation ②. BliMe decrypts-and-blinds the data, tagging it with the session-specific tag, and stores it in memory ③. The untrusted software can then use RoCC instructions ④ to make Dolma operate on the blinded data. Dolma accesses the data using and enforces the security policy ⑤. Once the processing is complete, the untrusted software can call BliMe's data export function to encrypt-and-unblind the data, and then send it back to the client ⑥.
Figure 2: in parallel for the weight-stationary data flow inside a 2x2 systolic array. The top half of the figure shows the case where $A$ is blinded. The bottom half shows the case where $D$ is blinded. The subfigures from left to right show how computation proceeds over successive cycle. Secret values are shown in red. Note that the tiles and registers themselves do not carry any additional logic. The tags corresponding to the secret values are shown as striped. Output tags are calculated before input values enter the systolic array and propagate alongside the corresponding secret input and intermediate values. The propagation is synchronized such that output rows receive the correct tag. In the case where $B$ is blinded (not shown here), all output from the systolic array would be blinded since $B$ is preloaded into the array.
Figure 3: Writes are pipelined to maintain throughput while enabling read-check-write to prevent tag mixing. A 2-stage pipeline is implemented. The dashed line visualizes the separation between the two stages.
Figure 4: Performance results of running ResNet-50 image classification for Dolma relative to unmodified Gemmini. We obtain an average overhead of 5.6% over all three configurations.
Figure 5: Resource usage results for Dolma relative to unmodified Gemmini.

Data-Oblivious ML Accelerators using Hardware Security Extensions

Abstract

Data-Oblivious ML Accelerators using Hardware Security Extensions

Authors

Abstract

Table of Contents

Figures (5)