Table of Contents
Fetching ...

VeriLLM: A Lightweight Framework for Publicly Verifiable Decentralized Inference

Ke Wang, Zishuo Zhao, Xinyuan Song, Bill Shi, Libin Xia, Chris Tong, Lynn Ai, Felix Qu, Eric Yang

TL;DR

VeriLLM tackles the verifiability problem in decentralized LLM inference by merging lightweight empirical reruns with cryptographic commitments, achieving public verifiability under a one-honest-verifier assumption with about 1% verification overhead. It introduces an isomorphic inference–verification network that multiplexes inference and verification across the same GPU workers, coupled with Merkle-root logging, VRF-based sampling, and on-chain dispute resolution to deter lazy or malicious behavior. The protocol converts the typically sequential decode phase into parallelizable prefill-based verification, enabling scalable auditing without heavy proofs or trusted execution environments. Experimental results demonstrate sub-1% overhead, robust detection of attacks, and near-linear scaling with verifier count, suggesting practical deployment for trustworthy decentralized AI services with auditable provenance.

Abstract

Decentralized inference provides a scalable and resilient paradigm for serving large language models (LLMs), enabling distributed resource utilization and reducing reliance on centralized providers. However, in a permissionless environment without trusted nodes, ensuring the correctness of model outputs remains a core challenge. We introduce VeriLLM, a publicly verifiable protocol for decentralized LLM inference that achieves security under a one-honest-verifier assumption while maintaining practical efficiency. VeriLLM combines lightweight empirical rerunning with cryptographic commitments, allowing verifiers to validate results at approximately 1% of the underlying inference cost. To prevent verification bottlenecks, we design an isomorphic inference-verification architecture that multiplexes both inference and verification roles across the same GPU workers. This design (i) improves GPU utilization and overall throughput, (ii) enlarges the effective validator set, enhancing robustness and liveness, and (iii) enforces task indistinguishability to prevent node-specific optimizations or selective behavior. Through theoretical analysis and system-level evaluation, we show that VeriLLM achieves reliable public verifiability with minimal overhead, offering a practical foundation for trustworthy and scalable decentralized LLM inference.

VeriLLM: A Lightweight Framework for Publicly Verifiable Decentralized Inference

TL;DR

VeriLLM tackles the verifiability problem in decentralized LLM inference by merging lightweight empirical reruns with cryptographic commitments, achieving public verifiability under a one-honest-verifier assumption with about 1% verification overhead. It introduces an isomorphic inference–verification network that multiplexes inference and verification across the same GPU workers, coupled with Merkle-root logging, VRF-based sampling, and on-chain dispute resolution to deter lazy or malicious behavior. The protocol converts the typically sequential decode phase into parallelizable prefill-based verification, enabling scalable auditing without heavy proofs or trusted execution environments. Experimental results demonstrate sub-1% overhead, robust detection of attacks, and near-linear scaling with verifier count, suggesting practical deployment for trustworthy decentralized AI services with auditable provenance.

Abstract

Decentralized inference provides a scalable and resilient paradigm for serving large language models (LLMs), enabling distributed resource utilization and reducing reliance on centralized providers. However, in a permissionless environment without trusted nodes, ensuring the correctness of model outputs remains a core challenge. We introduce VeriLLM, a publicly verifiable protocol for decentralized LLM inference that achieves security under a one-honest-verifier assumption while maintaining practical efficiency. VeriLLM combines lightweight empirical rerunning with cryptographic commitments, allowing verifiers to validate results at approximately 1% of the underlying inference cost. To prevent verification bottlenecks, we design an isomorphic inference-verification architecture that multiplexes both inference and verification roles across the same GPU workers. This design (i) improves GPU utilization and overall throughput, (ii) enlarges the effective validator set, enhancing robustness and liveness, and (iii) enforces task indistinguishability to prevent node-specific optimizations or selective behavior. Through theoretical analysis and system-level evaluation, we show that VeriLLM achieves reliable public verifiability with minimal overhead, offering a practical foundation for trustworthy and scalable decentralized LLM inference.

Paper Structure

This paper contains 49 sections, 2 theorems, 40 equations, 4 figures, 6 tables.

Key Result

Lemma 1

Given a collision-resistant hash and a binding Merkle tree, a prover that posts $r_i$ cannot later open inconsistent values at any sampled index except with negligible probability.

Figures (4)

  • Figure 1: System architecture of VeriLLM. The scheduler employs a verifiable random function (VRF) for unbiased node selection and hidden-state sampling. Each node group performs prefill and decoding, commits hidden states, and submits proofs to the blockchain, which verifies results, counts votes, and applies reward or penalty mechanisms to ensure verifiable decentralized inference.
  • Figure 2: Overview of the decentralized inference architecture. Each segment hosts identical Transformer layers, and hidden states are relayed across the pipeline under cryptographic commitments.
  • Figure 3: Overview of the decentralized verification architecture. Each verifier independently reconstructs its segment’s outputs via a full-sequence prefill, comparing committed and recomputed states.
  • Figure 4: Verification workflow. Each verifier independently executes a full-sequence prefill over its model segment and commits its output root to the blockchain for later consistency checks.

Theorems & Definitions (2)

  • Lemma 1: Binding
  • Lemma 2: Unpredictability