Table of Contents
Fetching ...

Scalable GPU-Based Integrity Verification for Large Machine Learning Models

Marcin Spoczynski, Marcela S. Melara

TL;DR

This work tackles the scalability and security gaps in integrity verification for large ML models by co-locating GPU-based cryptographic hashing and attestation with model execution. It proposes GPU-native integrity verification powered by SYCL kernels, Merkle-tree structures, and hardware attestation through Intel TDX and the forthcoming TDX Connect to eliminate TOCTOU vulnerabilities and CPU bottlenecks. The main contributions include scalable GPU-accelerated hashing ($SHA$-256/$SHA$-384), a hierarchical verification framework, and practical integrations with Atlas and PyTorch for production workflows, along with validation across multiple GPU architectures and model sizes. The approach promises real-time or near-real-time verification for models exceeding $100$ GB, enabling continuous security monitoring and stronger supply-chain integrity across multi-stakeholder deployments, while maintaining compatibility with established security standards and industry practices.

Abstract

We present a security framework that strengthens distributed machine learning by standardizing integrity protections across CPU and GPU platforms and significantly reducing verification overheads. Our approach co-locates integrity verification directly with large ML model execution on GPU accelerators, resolving the fundamental mismatch between how large ML workloads typically run (primarily on GPUs) and how security verifications traditionally operate (on separate CPU-based processes), delivering both immediate performance benefits and long-term architectural consistency. By performing cryptographic operations natively on GPUs using dedicated compute units (e.g., Intel Arc's XMX units, NVIDIA's Tensor Cores), our solution eliminates the potential architectural bottlenecks that could plague traditional CPU-based verification systems when dealing with large models. This approach leverages the same GPU-based high-memory bandwidth and parallel processing primitives that power ML workloads ensuring integrity checks keep pace with model execution even for massive models exceeding 100GB. This framework establishes a common integrity verification mechanism that works consistently across different GPU vendors and hardware configurations. By anticipating future capabilities for creating secure channels between trusted execution environments and GPU accelerators, we provide a hardware-agnostic foundation that enterprise teams can deploy regardless of their underlying CPU and GPU infrastructures.

Scalable GPU-Based Integrity Verification for Large Machine Learning Models

TL;DR

This work tackles the scalability and security gaps in integrity verification for large ML models by co-locating GPU-based cryptographic hashing and attestation with model execution. It proposes GPU-native integrity verification powered by SYCL kernels, Merkle-tree structures, and hardware attestation through Intel TDX and the forthcoming TDX Connect to eliminate TOCTOU vulnerabilities and CPU bottlenecks. The main contributions include scalable GPU-accelerated hashing (-256/-384), a hierarchical verification framework, and practical integrations with Atlas and PyTorch for production workflows, along with validation across multiple GPU architectures and model sizes. The approach promises real-time or near-real-time verification for models exceeding GB, enabling continuous security monitoring and stronger supply-chain integrity across multi-stakeholder deployments, while maintaining compatibility with established security standards and industry practices.

Abstract

We present a security framework that strengthens distributed machine learning by standardizing integrity protections across CPU and GPU platforms and significantly reducing verification overheads. Our approach co-locates integrity verification directly with large ML model execution on GPU accelerators, resolving the fundamental mismatch between how large ML workloads typically run (primarily on GPUs) and how security verifications traditionally operate (on separate CPU-based processes), delivering both immediate performance benefits and long-term architectural consistency. By performing cryptographic operations natively on GPUs using dedicated compute units (e.g., Intel Arc's XMX units, NVIDIA's Tensor Cores), our solution eliminates the potential architectural bottlenecks that could plague traditional CPU-based verification systems when dealing with large models. This approach leverages the same GPU-based high-memory bandwidth and parallel processing primitives that power ML workloads ensuring integrity checks keep pace with model execution even for massive models exceeding 100GB. This framework establishes a common integrity verification mechanism that works consistently across different GPU vendors and hardware configurations. By anticipating future capabilities for creating secure channels between trusted execution environments and GPU accelerators, we provide a hardware-agnostic foundation that enterprise teams can deploy regardless of their underlying CPU and GPU infrastructures.

Paper Structure

This paper contains 79 sections, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 1: Architectural comparison between current CPU-based and proposed GPU-accelerated model integrity verification systems. The current approach (left) suffers from performance bottlenecks due to CPU-mediated verification and CPU-GPU data movement. The proposed approach (right) co-locates verification and execution within the same GPU platform and Intel TDX trust boundary, eliminating TOCTOU vulnerabilities through unified processing.
  • Figure 2: GPU memory organization for parallel hash computation showing the hierarchical memory layout that enables efficient SYCL kernel execution. Model parameters are transferred from host memory via PCIe to GPU global memory, where specialized buffers organize data for parallel processing across thousands of compute units. The memory hierarchy includes input buffers for message blocks, constants buffers for SHA round values, output buffers for hash results, and shared memory for work-group coordination, achieving optimal memory bandwidth utilization and parallel throughput.
  • Figure 3: Time-of-Check-Time-of-Use vulnerability in current CPU-mediated model verification systems. The temporal and spatial separation between verification and pipeline execution creates attack opportunities.
  • Figure 4: Proposed integration of GPU-accelerated integrity verification with Intel TDX trust boundaries. Upcoming Intel TDX Connect capabilities enable unified trust domains spanning CPU and GPU resources.
  • Figure 5: Multi-GPU verification architecture showing distributed hash computation and Merkle tree construction. GPU-to-GPU communication enables efficient coordination without CPU bottlenecks.
  • ...and 1 more figures