Table of Contents
Fetching ...

Sentry: Authenticating Machine Learning Artifacts on the Fly

Andrew Gan, Zahra Ghodsi

TL;DR

The paper tackles the risk of supply-chain attacks on open-source ML artifacts by introducing Sentry, a GPU-based framework that authenticates datasets and models on the fly as they are loaded into GPU memory. It couples Sigstore-style attestation with GPU-accelerated cryptographic hashing (Merkle tree and lattice hashing) to provide high-throughput, end-to-end artifact verification compatible with GPUDirect data movement. Key contributions include GPU implementations of Merkle-tree and lattice hashing, per-layer and per-sample authentication options, integration with NVIDIA DALI for data processing, and a Python library that enables seamless end-to-end signing and verification. The results show substantial speedups over CPU baselines (up to hundreds of times faster) and modest storage overhead, making on-demand authenticity practical for large ML artifacts in real-world pipelines.

Abstract

Machine learning systems increasingly rely on open-source artifacts such as datasets and models that are created or hosted by other parties. The reliance on external datasets and pre-trained models exposes the system to supply chain attacks where an artifact can be poisoned before it is delivered to the end-user. Such attacks are possible due to the lack of any authenticity verification in existing machine learning systems. Incorporating cryptographic solutions such as hashing and signing can mitigate the risk of supply chain attacks. However, existing frameworks for integrity verification based on cryptographic techniques can incur significant overhead when applied to state-of-the-art machine learning artifacts due to their scale, and are not compatible with GPU platforms. In this paper, we develop Sentry, a novel GPU-based framework that verifies the authenticity of machine learning artifacts by implementing cryptographic signing and verification for datasets and models. Sentry ties developer identities to signatures and performs authentication on the fly as artifacts are loaded on GPU memory, making it compatible with GPU data movement solutions such as NVIDIA GPUDirect that bypass the CPU. Sentry incorporates GPU acceleration of cryptographic hash constructions such as Merkle tree and lattice hashing, implementing memory optimizations and resource partitioning schemes for a high throughput performance. Our evaluations show that Sentry is a practical solution to bring authenticity to machine learning systems, achieving orders of magnitude speedup over a CPU-based baseline.

Sentry: Authenticating Machine Learning Artifacts on the Fly

TL;DR

The paper tackles the risk of supply-chain attacks on open-source ML artifacts by introducing Sentry, a GPU-based framework that authenticates datasets and models on the fly as they are loaded into GPU memory. It couples Sigstore-style attestation with GPU-accelerated cryptographic hashing (Merkle tree and lattice hashing) to provide high-throughput, end-to-end artifact verification compatible with GPUDirect data movement. Key contributions include GPU implementations of Merkle-tree and lattice hashing, per-layer and per-sample authentication options, integration with NVIDIA DALI for data processing, and a Python library that enables seamless end-to-end signing and verification. The results show substantial speedups over CPU baselines (up to hundreds of times faster) and modest storage overhead, making on-demand authenticity practical for large ML artifacts in real-world pipelines.

Abstract

Machine learning systems increasingly rely on open-source artifacts such as datasets and models that are created or hosted by other parties. The reliance on external datasets and pre-trained models exposes the system to supply chain attacks where an artifact can be poisoned before it is delivered to the end-user. Such attacks are possible due to the lack of any authenticity verification in existing machine learning systems. Incorporating cryptographic solutions such as hashing and signing can mitigate the risk of supply chain attacks. However, existing frameworks for integrity verification based on cryptographic techniques can incur significant overhead when applied to state-of-the-art machine learning artifacts due to their scale, and are not compatible with GPU platforms. In this paper, we develop Sentry, a novel GPU-based framework that verifies the authenticity of machine learning artifacts by implementing cryptographic signing and verification for datasets and models. Sentry ties developer identities to signatures and performs authentication on the fly as artifacts are loaded on GPU memory, making it compatible with GPU data movement solutions such as NVIDIA GPUDirect that bypass the CPU. Sentry incorporates GPU acceleration of cryptographic hash constructions such as Merkle tree and lattice hashing, implementing memory optimizations and resource partitioning schemes for a high throughput performance. Our evaluations show that Sentry is a practical solution to bring authenticity to machine learning systems, achieving orders of magnitude speedup over a CPU-based baseline.

Paper Structure

This paper contains 25 sections, 1 equation, 10 figures, 1 table, 4 algorithms.

Figures (10)

  • Figure 1: Sigstore payload structure.
  • Figure 2: Cryptographic hash constructions based on Merkle-Damgård, Merkle tree, and homomorphic hashing from one-way compression function $h$.
  • Figure 3: High-level description $\mathsf{Sentry}$ with data provider, model provider, ML hub, and user roles. Model and data providers augment their artifact with an authenticated payload before uploading them to the hub. Any user accessing artifacts on the hub can verify their authentication on the fly as artifacts are loaded on GPU memory for ML tasks.
  • Figure 4: Depiction of block hashing and tree reduction kernels. Shared memory is used for storing intermediate values within a thread block, global memory for collecting results from different blocks, and constant memory for storing lookup tables with hard-coded values. The input read from global memory is reduced in shared memory, whereas the last intermediate results in shared memory are reduced to the final result in global memory to minimize the number of memory transfers.
  • Figure 5: Depiction of three model hashing implementations. The model is defined in PyTorch using a dictionary state_dict, which maps each layer to the corresponding weight tensor. After the model is loaded on GPU memory, the layers are fragmented across GPU memory space. We propose coalesced, per-layer, and in-place implementations for computing the model hash.
  • ...and 5 more figures