Confidential Computing on Heterogeneous CPU-GPU Systems: Survey and Future Directions
Qifan Wang, David Oswald
TL;DR
This paper surveys confidential computing on heterogeneous CPU-GPU systems, focusing on GPU TEEs, attack surfaces, and future directions. It surveys a spectrum of GPU-TEE designs across x86, Arm, and vendor-enabled platforms (e.g., NVIDIA H100), analyzes threat models, hardware modifications, TCB size, and overheads, and maps CPU TEE attacks to GPU contexts. The authors catalog architectural, microarchitectural, physical, and fault-injection attacks on GPUs, synthesize countermeasures, and discuss how design choices (e.g., memory encryption, VM-based isolation, or software-based verifications) influence security and practicality. The work emphasizes the need for architecture-aware defenses, rigorous analysis of GPU memory and interconnect vulnerabilities, and continued exploration of secure GPU design to enable secure high-performance computing in cloud and data-center contexts.
Abstract
In recent years, the widespread informatization and rapid data explosion have increased the demand for high-performance heterogeneous systems that integrate multiple computing cores such as CPUs, Graphics Processing Units (GPUs), Application Specific Integrated Circuits (ASICs), and Field Programmable Gate Arrays (FPGAs). The combination of CPU and GPU is particularly popular due to its versatility. However, these heterogeneous systems face significant security and privacy risks. Advances in privacy-preserving techniques, especially hardware-based Trusted Execution Environments (TEEs), offer effective protection for GPU applications. Nonetheless, the potential security risks involved in extending TEEs to GPUs in heterogeneous systems remain uncertain and need further investigation. To investigate these risks in depth, we study the existing popular GPU TEE designs and summarize and compare their key implications. Additionally, we review existing powerful attacks on GPUs and traditional TEEs deployed on CPUs, along with the efforts to mitigate these threats. We identify potential attack surfaces introduced by GPU TEEs and provide insights into key considerations for designing secure GPU TEEs. This survey is timely as new TEEs for heterogeneous systems, particularly GPUs, are being developed, highlighting the need to understand potential security threats and build both efficient and secure systems.
