Benchmarking Analytical Query Processing in Intel SGXv2
Adrian Lutsch, Muhammad El-Hindi, Matthias Heinrich, Daniel Ritter, Zsolt István, Carsten Binnig
TL;DR
This paper evaluates analytical query processing inside Intel SGXv2 enclaves, demonstrating that modern, cache-conscious algorithms such as radix joins and SIMD scans can approach native throughput when properly optimized for SGXv2. It identifies key overhead sources, including random memory access, side-channel mitigations (notably Spectre V4), NUMA placement, and SDK-induced synchronization, and presents practical optimizations like loop unrolling and instruction reordering that significantly close the gap to non-secure execution. The study also reveals that SGXv2’s memory and UPI subsystems, while less bottlenecked than SGXv1, still require architecture-aware design choices to achieve near-native performance for end-to-end query plans, including careful memory management and cross-NUMA considerations. Overall, the findings indicate SGXv2 can deliver near-native analytic performance for OLAP workloads, making secure cloud DBMSs more viable, with clear guidance on optimizations and potential future work in security-focused protections.
Abstract
Trusted Execution Environments (TEEs), such as Intel's Software Guard Extensions (SGX), are increasingly being adopted to address trust and compliance issues in the public cloud. Intel SGX's second generation (SGXv2) addresses many limitations of its predecessor (SGXv1), offering the potential for secure and efficient analytical cloud DBMSs. We assess this potential and conduct the first in-depth evaluation study of analytical query processing algorithms inside SGXv2. Our study reveals that, unlike SGXv1, state-of-the-art algorithms like radix joins and SIMD-based scans are a good starting point for achieving high-performance query processing inside SGXv2. However, subtle hardware and software differences still influence code execution inside SGX enclaves and cause substantial overheads. We investigate these differences and propose new optimizations to bring the performance inside enclaves on par with native code execution outside enclaves.
