Table of Contents
Fetching ...

Data-Oblivious ML Accelerators using Hardware Security Extensions

Hossam ElAtali, John Z. Jekel, Lachlan J. Gunn, N. Asokan

Abstract

Outsourced computation can put client data confidentiality at risk. Existing solutions are either inefficient or insufficiently secure: cryptographic techniques like fully-homomorphic encryption incur significant overheads, even with hardware assistance, while the complexity of hardware-assisted trusted execution environments has been exploited to leak secret data. Recent proposals such as BliMe and OISA show how dynamic information flow tracking (DIFT) enforced in hardware can protect client data efficiently. They are designed to protect CPU-only workloads. However, many outsourced computing applications, like machine learning, make extensive use of accelerators. We address this gap with Dolma, which applies DIFT to the Gemmini matrix multiplication accelerator, efficiently guaranteeing client data confidentiality, even in the presence of malicious/vulnerable software and side channel attacks on the server. We show that accelerators can allow DIFT logic optimizations that significantly reduce area overhead compared with general-purpose processor architectures. Dolma is integrated with the BliMe framework to achieve end-to-end security guarantees. We evaluate Dolma on an FPGA using a ResNet-50 DNN model and show that it incurs low overheads for large configurations ($4.4\%$, $16.7\%$, $16.5\%$ for performance, resource usage and power, respectively, with a 32x32 configuration).

Data-Oblivious ML Accelerators using Hardware Security Extensions

Abstract

Outsourced computation can put client data confidentiality at risk. Existing solutions are either inefficient or insufficiently secure: cryptographic techniques like fully-homomorphic encryption incur significant overheads, even with hardware assistance, while the complexity of hardware-assisted trusted execution environments has been exploited to leak secret data. Recent proposals such as BliMe and OISA show how dynamic information flow tracking (DIFT) enforced in hardware can protect client data efficiently. They are designed to protect CPU-only workloads. However, many outsourced computing applications, like machine learning, make extensive use of accelerators. We address this gap with Dolma, which applies DIFT to the Gemmini matrix multiplication accelerator, efficiently guaranteeing client data confidentiality, even in the presence of malicious/vulnerable software and side channel attacks on the server. We show that accelerators can allow DIFT logic optimizations that significantly reduce area overhead compared with general-purpose processor architectures. Dolma is integrated with the BliMe framework to achieve end-to-end security guarantees. We evaluate Dolma on an FPGA using a ResNet-50 DNN model and show that it incurs low overheads for large configurations (, , for performance, resource usage and power, respectively, with a 32x32 configuration).
Paper Structure (26 sections, 3 equations, 5 figures, 1 table)

This paper contains 26 sections, 3 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: System Overview. The client encrypts and send their secret data to the untrusted software on the server ①, which calls BliMe's data import operation ②. BliMe decrypts-and-blinds the data, tagging it with the session-specific tag, and stores it in memory ③. The untrusted software can then use RoCC instructions ④ to make Dolma operate on the blinded data. Dolma accesses the data using and enforces the security policy ⑤. Once the processing is complete, the untrusted software can call BliMe's data export function to encrypt-and-unblind the data, and then send it back to the client ⑥.
  • Figure 2: in parallel for the weight-stationary data flow inside a 2x2 systolic array. The top half of the figure shows the case where $A$ is blinded. The bottom half shows the case where $D$ is blinded. The subfigures from left to right show how computation proceeds over successive cycle. Secret values are shown in red. Note that the tiles and registers themselves do not carry any additional logic. The tags corresponding to the secret values are shown as striped. Output tags are calculated before input values enter the systolic array and propagate alongside the corresponding secret input and intermediate values. The propagation is synchronized such that output rows receive the correct tag. In the case where $B$ is blinded (not shown here), all output from the systolic array would be blinded since $B$ is preloaded into the array.
  • Figure 3: Writes are pipelined to maintain throughput while enabling read-check-write to prevent tag mixing. A 2-stage pipeline is implemented. The dashed line visualizes the separation between the two stages.
  • Figure 4: Performance results of running ResNet-50 image classification for Dolma relative to unmodified Gemmini. We obtain an average overhead of 5.6% over all three configurations.
  • Figure 5: Resource usage results for Dolma relative to unmodified Gemmini.