Palermo: Improving the Performance of Oblivious Memory using Protocol-Hardware Co-Design

Haojie Ye; Yuchen Xia; Yuhan Chen; Kuan-Yu Chen; Yichao Yuan; Shuwen Deng; Baris Kasikci; Trevor Mudge; Nishil Talati

Palermo: Improving the Performance of Oblivious Memory using Protocol-Hardware Co-Design

Haojie Ye, Yuchen Xia, Yuhan Chen, Kuan-Yu Chen, Yichao Yuan, Shuwen Deng, Baris Kasikci, Trevor Mudge, Nishil Talati

TL;DR

The key observation in Palermo is that classical ORAM protocols enforce restrictive dependencies between memory operations that result in low memory bandwidth utilization, and Palermo introduces a new protocol that overlaps large portions of memory operations, within a single and between multiple ORAM requests, without breaking correctness and security guarantees.

Abstract

Oblivious RAM (ORAM) hides the memory access patterns, enhancing data privacy by preventing attackers from discovering sensitive information based on the sequence of memory accesses. The performance of ORAM is often limited by its inherent trade-off between security and efficiency, as concealing memory access patterns imposes significant computational and memory overhead. While prior works focus on improving the ORAM performance by prefetching and eliminating ORAM requests, we find that their performance is very sensitive to workload locality behavior and incurs additional management overhead caused by the ORAM stash pressure. This paper presents Palermo: a protocol-hardware co-design to improve ORAM performance. The key observation in Palermo is that classical ORAM protocols enforce restrictive dependencies between memory operations that result in low memory bandwidth utilization. Palermo introduces a new protocol that overlaps large portions of memory operations, within a single and between multiple ORAM requests, without breaking correctness and security guarantees. Subsequently, we propose an ORAM controller architecture that executes the proposed protocol to service ORAM requests. The hardware is responsible for concurrently issuing memory requests as well as imposing the necessary dependencies to ensure a consistent view of the ORAM tree across requests. Using a rich workload mix, we demonstrate that Palermo outperforms the RingORAM baseline by 2.8x, on average, incurring a negligible area overhead of 5.78mm^2 (less than 2% in 12th generation Intel CPU after technology scaling) and 2.14W without sacrificing security. We further show that Palermo also outperforms the state-of-the-art works PageORAM, PrORAM, and IR-ORAM.

Palermo: Improving the Performance of Oblivious Memory using Protocol-Hardware Co-Design

TL;DR

Abstract

Palermo: Improving the Performance of Oblivious Memory using Protocol-Hardware Co-Design

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (15)