Benchmarking Analytical Query Processing in Intel SGXv2

Adrian Lutsch; Muhammad El-Hindi; Matthias Heinrich; Daniel Ritter; Zsolt István; Carsten Binnig

Benchmarking Analytical Query Processing in Intel SGXv2

Adrian Lutsch, Muhammad El-Hindi, Matthias Heinrich, Daniel Ritter, Zsolt István, Carsten Binnig

TL;DR

This paper evaluates analytical query processing inside Intel SGXv2 enclaves, demonstrating that modern, cache-conscious algorithms such as radix joins and SIMD scans can approach native throughput when properly optimized for SGXv2. It identifies key overhead sources, including random memory access, side-channel mitigations (notably Spectre V4), NUMA placement, and SDK-induced synchronization, and presents practical optimizations like loop unrolling and instruction reordering that significantly close the gap to non-secure execution. The study also reveals that SGXv2’s memory and UPI subsystems, while less bottlenecked than SGXv1, still require architecture-aware design choices to achieve near-native performance for end-to-end query plans, including careful memory management and cross-NUMA considerations. Overall, the findings indicate SGXv2 can deliver near-native analytic performance for OLAP workloads, making secure cloud DBMSs more viable, with clear guidance on optimizations and potential future work in security-focused protections.

Abstract

Trusted Execution Environments (TEEs), such as Intel's Software Guard Extensions (SGX), are increasingly being adopted to address trust and compliance issues in the public cloud. Intel SGX's second generation (SGXv2) addresses many limitations of its predecessor (SGXv1), offering the potential for secure and efficient analytical cloud DBMSs. We assess this potential and conduct the first in-depth evaluation study of analytical query processing algorithms inside SGXv2. Our study reveals that, unlike SGXv1, state-of-the-art algorithms like radix joins and SIMD-based scans are a good starting point for achieving high-performance query processing inside SGXv2. However, subtle hardware and software differences still influence code execution inside SGX enclaves and cause substantial overheads. We investigate these differences and propose new optimizations to bring the performance inside enclaves on par with native code execution outside enclaves.

Benchmarking Analytical Query Processing in Intel SGXv2

TL;DR

Abstract

Paper Structure (57 sections, 17 figures, 2 tables)

This paper contains 57 sections, 17 figures, 2 tables.

Introduction
The need for secure cloud DBMSs.
TEEs to the rescue?
Security does not come for free.
Recent advances of SGX lift limitations.
The need for a performance study of SGXv2.
Focus on analytical query processing.
Contribution and main findings.
Outline.
Intel SGXv2 Background
Integrity and confidentiality in SGX.
Major differences in SGXv2.
Implications of SGXv2 for DBMSs.
Benchmark Overview
Benchmarking settings.
...and 42 more sections

Figures (17)

Figure 1: Performance of joining a 100 (hash) and a 400 (probe) table inside an SGXv2 enclave. The join designed for SGXv1 does not achieve competitive performance (blue). A state-of-the-art radix join is a better starting point (orange), and with our optimization (green), its performance is similar to outside the enclave (red).
Figure 2: Intel SGX implements enclaves via a protected memory region in RAM, called PRM. Data and code of enclaves are stored in encrypted memory pages inside the EPC. They are decrypted when loaded into the cache. The UCE encrypts enclave UPI traffic.
Figure 3: Overview of join algorithm throughput for 5 different joins executed using 16 threads to join a 100 and a 400 table on SGXv2 hardware. The SGXv1-optimized CrkJoin is the slowest join in this comparison. The hash joins have the highest slowdowns.
Figure 4: Left: Throughput of a single-threaded hash join with data and execution inside an SGXv2 enclave (DiE) relative to plain CPU. Join performance with large hash tables suffers from random access overhead. Right: Comparison of join phase runtimes at 100 hash size. The slowdown of the build phase inside the enclave is significant.
Figure 5: Performance of random memory reads and writes in an SGX enclave relative to plain CPU. In the cache, random access performance is equal. Random accesses to main memory are significantly slower in SGXv2.
...and 12 more figures

Benchmarking Analytical Query Processing in Intel SGXv2

TL;DR

Abstract

Benchmarking Analytical Query Processing in Intel SGXv2

Authors

TL;DR

Abstract

Table of Contents

Figures (17)