Table of Contents
Fetching ...

Benchmarking Analytical Query Processing in Intel SGXv2

Adrian Lutsch, Muhammad El-Hindi, Matthias Heinrich, Daniel Ritter, Zsolt István, Carsten Binnig

TL;DR

This paper evaluates analytical query processing inside Intel SGXv2 enclaves, demonstrating that modern, cache-conscious algorithms such as radix joins and SIMD scans can approach native throughput when properly optimized for SGXv2. It identifies key overhead sources, including random memory access, side-channel mitigations (notably Spectre V4), NUMA placement, and SDK-induced synchronization, and presents practical optimizations like loop unrolling and instruction reordering that significantly close the gap to non-secure execution. The study also reveals that SGXv2’s memory and UPI subsystems, while less bottlenecked than SGXv1, still require architecture-aware design choices to achieve near-native performance for end-to-end query plans, including careful memory management and cross-NUMA considerations. Overall, the findings indicate SGXv2 can deliver near-native analytic performance for OLAP workloads, making secure cloud DBMSs more viable, with clear guidance on optimizations and potential future work in security-focused protections.

Abstract

Trusted Execution Environments (TEEs), such as Intel's Software Guard Extensions (SGX), are increasingly being adopted to address trust and compliance issues in the public cloud. Intel SGX's second generation (SGXv2) addresses many limitations of its predecessor (SGXv1), offering the potential for secure and efficient analytical cloud DBMSs. We assess this potential and conduct the first in-depth evaluation study of analytical query processing algorithms inside SGXv2. Our study reveals that, unlike SGXv1, state-of-the-art algorithms like radix joins and SIMD-based scans are a good starting point for achieving high-performance query processing inside SGXv2. However, subtle hardware and software differences still influence code execution inside SGX enclaves and cause substantial overheads. We investigate these differences and propose new optimizations to bring the performance inside enclaves on par with native code execution outside enclaves.

Benchmarking Analytical Query Processing in Intel SGXv2

TL;DR

This paper evaluates analytical query processing inside Intel SGXv2 enclaves, demonstrating that modern, cache-conscious algorithms such as radix joins and SIMD scans can approach native throughput when properly optimized for SGXv2. It identifies key overhead sources, including random memory access, side-channel mitigations (notably Spectre V4), NUMA placement, and SDK-induced synchronization, and presents practical optimizations like loop unrolling and instruction reordering that significantly close the gap to non-secure execution. The study also reveals that SGXv2’s memory and UPI subsystems, while less bottlenecked than SGXv1, still require architecture-aware design choices to achieve near-native performance for end-to-end query plans, including careful memory management and cross-NUMA considerations. Overall, the findings indicate SGXv2 can deliver near-native analytic performance for OLAP workloads, making secure cloud DBMSs more viable, with clear guidance on optimizations and potential future work in security-focused protections.

Abstract

Trusted Execution Environments (TEEs), such as Intel's Software Guard Extensions (SGX), are increasingly being adopted to address trust and compliance issues in the public cloud. Intel SGX's second generation (SGXv2) addresses many limitations of its predecessor (SGXv1), offering the potential for secure and efficient analytical cloud DBMSs. We assess this potential and conduct the first in-depth evaluation study of analytical query processing algorithms inside SGXv2. Our study reveals that, unlike SGXv1, state-of-the-art algorithms like radix joins and SIMD-based scans are a good starting point for achieving high-performance query processing inside SGXv2. However, subtle hardware and software differences still influence code execution inside SGX enclaves and cause substantial overheads. We investigate these differences and propose new optimizations to bring the performance inside enclaves on par with native code execution outside enclaves.
Paper Structure (57 sections, 17 figures, 2 tables)

This paper contains 57 sections, 17 figures, 2 tables.

Figures (17)

  • Figure 1: Performance of joining a 100 (hash) and a 400 (probe) table inside an SGXv2 enclave. The join designed for SGXv1 does not achieve competitive performance (blue). A state-of-the-art radix join is a better starting point (orange), and with our optimization (green), its performance is similar to outside the enclave (red).
  • Figure 2: Intel SGX implements enclaves via a protected memory region in RAM, called PRM. Data and code of enclaves are stored in encrypted memory pages inside the EPC. They are decrypted when loaded into the cache. The UCE encrypts enclave UPI traffic.
  • Figure 3: Overview of join algorithm throughput for 5 different joins executed using 16 threads to join a 100 and a 400 table on SGXv2 hardware. The SGXv1-optimized CrkJoin is the slowest join in this comparison. The hash joins have the highest slowdowns.
  • Figure 4: Left: Throughput of a single-threaded hash join with data and execution inside an SGXv2 enclave (DiE) relative to plain CPU. Join performance with large hash tables suffers from random access overhead. Right: Comparison of join phase runtimes at 100 hash size. The slowdown of the build phase inside the enclave is significant.
  • Figure 5: Performance of random memory reads and writes in an SGX enclave relative to plain CPU. In the cache, random access performance is equal. Random accesses to main memory are significantly slower in SGXv2.
  • ...and 12 more figures