Table of Contents
Fetching ...

Enabling Low-Cost Secure Computing on Untrusted In-Memory Architectures

Sahar Ghoflsaz Ghinani, Jingyao Zhang, Elaheh Sadredini

TL;DR

This paper tackles the memory bottleneck by enabling secure computing on untrusted Processing-in-Memory (PIM) hardware. It introduces a multi-party computation framework that combines arithmetic secret sharing for linear operations with Yao’s garbled circuits for nonlinear operations, augmented by a precomputation strategy to avoid CPU bottlenecks. The approach uses counter-mode encryption and MAC-based verification to maintain data confidentiality and integrity, and it is evaluated on real UPMEM hardware across workloads including MLP, DLRM, linear regression, and logistic regression, achieving up to 14.66× speedups over secure CPU baselines. The results demonstrate that secure PIM can significantly accelerate data-intensive tasks without compromising security, and the framework is extensible to additional operations like GEMM and Convolution in larger neural networks.

Abstract

Modern computing systems are limited in performance by the memory bandwidth available to processors, a problem known as the memory wall. Processing-in-Memory (PIM) promises to substantially improve this problem by moving processing closer to the data, improving effective data bandwidth, and leading to superior performance on memory-intensive workloads. However, integrating PIM modules within a secure computing system raises an interesting challenge: unencrypted data has to move off-chip to the PIM, exposing the data to attackers and breaking assumptions on Trusted Computing Bases (TCBs). To tackle this challenge, this paper leverages multi-party computation (MPC) techniques, specifically arithmetic secret sharing and Yao's garbled circuits, to outsource bandwidth-intensive computation securely to PIM. Additionally, we leverage precomputation optimization to prevent the CPU's portion of the MPC from becoming a bottleneck. We evaluate our approach using the UPMEM PIM system over various applications such as Deep Learning Recommendation Model inference and Logistic Regression. Our evaluations demonstrate up to a $14.66\times$ speedup compared to a secure CPU configuration while maintaining data confidentiality and integrity when outsourcing linear and/or nonlinear computation.

Enabling Low-Cost Secure Computing on Untrusted In-Memory Architectures

TL;DR

This paper tackles the memory bottleneck by enabling secure computing on untrusted Processing-in-Memory (PIM) hardware. It introduces a multi-party computation framework that combines arithmetic secret sharing for linear operations with Yao’s garbled circuits for nonlinear operations, augmented by a precomputation strategy to avoid CPU bottlenecks. The approach uses counter-mode encryption and MAC-based verification to maintain data confidentiality and integrity, and it is evaluated on real UPMEM hardware across workloads including MLP, DLRM, linear regression, and logistic regression, achieving up to 14.66× speedups over secure CPU baselines. The results demonstrate that secure PIM can significantly accelerate data-intensive tasks without compromising security, and the framework is extensible to additional operations like GEMM and Convolution in larger neural networks.

Abstract

Modern computing systems are limited in performance by the memory bandwidth available to processors, a problem known as the memory wall. Processing-in-Memory (PIM) promises to substantially improve this problem by moving processing closer to the data, improving effective data bandwidth, and leading to superior performance on memory-intensive workloads. However, integrating PIM modules within a secure computing system raises an interesting challenge: unencrypted data has to move off-chip to the PIM, exposing the data to attackers and breaking assumptions on Trusted Computing Bases (TCBs). To tackle this challenge, this paper leverages multi-party computation (MPC) techniques, specifically arithmetic secret sharing and Yao's garbled circuits, to outsource bandwidth-intensive computation securely to PIM. Additionally, we leverage precomputation optimization to prevent the CPU's portion of the MPC from becoming a bottleneck. We evaluate our approach using the UPMEM PIM system over various applications such as Deep Learning Recommendation Model inference and Logistic Regression. Our evaluations demonstrate up to a speedup compared to a secure CPU configuration while maintaining data confidentiality and integrity when outsourcing linear and/or nonlinear computation.

Paper Structure

This paper contains 41 sections, 9 equations, 19 figures, 2 tables.

Figures (19)

  • Figure 1: (a) TEE-based system. Only on-chip modules (e.g., cores, cache) are within the TCB. The PIM module cannot operate on encrypted data. (b) TEE-based system with MPC. Private data can be split into two on-chip and off-chip shares. The PIM module can operate on computable MPC share.
  • Figure 2: Speedup of the SecNDP security scheme using UPMEM over an insecure CPU for the GEMV kernel with different input sizes. The SecNDP does not perform well when the amount of public data is significantly large.
  • Figure 3: Proposed threat model. TEE is the trusted party, and both standard and PIM-enabled memories are untrusted. An attacker can access or modify the data within these memories and the bus.
  • Figure 4: Overview of the UPMEM Architecture.
  • Figure 5: (a) Counter-mode Encryption: Generated One-Time Pads are XORed with the plaintext to produce the ciphertext. (b) Arithmetic Secret Sharing: The trusted party distributes the computation among multiple untrusted parties. Partial results are retrieved and aggregated to achieve the final result.
  • ...and 14 more figures