Scalable GPU-Based Integrity Verification for Large Machine Learning Models
Marcin Spoczynski, Marcela S. Melara
TL;DR
This work tackles the scalability and security gaps in integrity verification for large ML models by co-locating GPU-based cryptographic hashing and attestation with model execution. It proposes GPU-native integrity verification powered by SYCL kernels, Merkle-tree structures, and hardware attestation through Intel TDX and the forthcoming TDX Connect to eliminate TOCTOU vulnerabilities and CPU bottlenecks. The main contributions include scalable GPU-accelerated hashing ($SHA$-256/$SHA$-384), a hierarchical verification framework, and practical integrations with Atlas and PyTorch for production workflows, along with validation across multiple GPU architectures and model sizes. The approach promises real-time or near-real-time verification for models exceeding $100$ GB, enabling continuous security monitoring and stronger supply-chain integrity across multi-stakeholder deployments, while maintaining compatibility with established security standards and industry practices.
Abstract
We present a security framework that strengthens distributed machine learning by standardizing integrity protections across CPU and GPU platforms and significantly reducing verification overheads. Our approach co-locates integrity verification directly with large ML model execution on GPU accelerators, resolving the fundamental mismatch between how large ML workloads typically run (primarily on GPUs) and how security verifications traditionally operate (on separate CPU-based processes), delivering both immediate performance benefits and long-term architectural consistency. By performing cryptographic operations natively on GPUs using dedicated compute units (e.g., Intel Arc's XMX units, NVIDIA's Tensor Cores), our solution eliminates the potential architectural bottlenecks that could plague traditional CPU-based verification systems when dealing with large models. This approach leverages the same GPU-based high-memory bandwidth and parallel processing primitives that power ML workloads ensuring integrity checks keep pace with model execution even for massive models exceeding 100GB. This framework establishes a common integrity verification mechanism that works consistently across different GPU vendors and hardware configurations. By anticipating future capabilities for creating secure channels between trusted execution environments and GPU accelerators, we provide a hardware-agnostic foundation that enterprise teams can deploy regardless of their underlying CPU and GPU infrastructures.
