Table of Contents
Fetching ...

Fastrack: Fast IO for Secure ML using GPU TEEs

Yongqin Wang, Rachit Rajat, Jonghyun Lee, Tingting Tang, Murali Annavaram

TL;DR

This paper analyzes Nvidia H100 TEE protocols and proposes Fastrack, optimizing with 1) direct GPU TEE communication, 2) parallelized authentication, and 3) overlapping decryption with PCI-e transmission that cut communication costs and reduce inference/training runtime by up to 84.6%, with minimal overhead compared to non-TEE systems.

Abstract

As cloud-based ML expands, ensuring data security during training and inference is critical. GPU-based Trusted Execution Environments (TEEs) offer secure, high-performance solutions, with CPU TEEs managing data movement and GPU TEEs handling authentication and computation. However, CPU-to-GPU communication overheads significantly hinder performance, as data must be encrypted, authenticated, decrypted, and verified, increasing costs by 12.69 to 33.53 times. This results in GPU TEE inference becoming 54.12% to 903.9% slower and training 10% to 455% slower than non-TEE systems, undermining GPU TEE advantages in latency-sensitive applications. This paper analyzes Nvidia H100 TEE protocols and identifies three key overheads: 1) redundant CPU re-encryption, 2) limited authentication parallelism, and 3) unnecessary operation serialization. We propose Fastrack, optimizing with 1) direct GPU TEE communication, 2) parallelized authentication, and 3) overlapping decryption with PCI-e transmission. These optimizations cut communication costs and reduce inference/training runtime by up to 84.6%, with minimal overhead compared to non-TEE systems.

Fastrack: Fast IO for Secure ML using GPU TEEs

TL;DR

This paper analyzes Nvidia H100 TEE protocols and proposes Fastrack, optimizing with 1) direct GPU TEE communication, 2) parallelized authentication, and 3) overlapping decryption with PCI-e transmission that cut communication costs and reduce inference/training runtime by up to 84.6%, with minimal overhead compared to non-TEE systems.

Abstract

As cloud-based ML expands, ensuring data security during training and inference is critical. GPU-based Trusted Execution Environments (TEEs) offer secure, high-performance solutions, with CPU TEEs managing data movement and GPU TEEs handling authentication and computation. However, CPU-to-GPU communication overheads significantly hinder performance, as data must be encrypted, authenticated, decrypted, and verified, increasing costs by 12.69 to 33.53 times. This results in GPU TEE inference becoming 54.12% to 903.9% slower and training 10% to 455% slower than non-TEE systems, undermining GPU TEE advantages in latency-sensitive applications. This paper analyzes Nvidia H100 TEE protocols and identifies three key overheads: 1) redundant CPU re-encryption, 2) limited authentication parallelism, and 3) unnecessary operation serialization. We propose Fastrack, optimizing with 1) direct GPU TEE communication, 2) parallelized authentication, and 3) overlapping decryption with PCI-e transmission. These optimizations cut communication costs and reduce inference/training runtime by up to 84.6%, with minimal overhead compared to non-TEE systems.

Paper Structure

This paper contains 23 sections, 12 figures, 2 tables.

Figures (12)

  • Figure 1: GPU confidential computing system topology.
  • Figure 2: ML workflow with GPU TEEs
  • Figure 3: AES-CTR mode to encrypt $m$ blocks of plaintext.
  • Figure 4: GMAC used in AES-GCM MAC tag generation.
  • Figure 5: Baseline computation and communication flow.
  • ...and 7 more figures