VeriLLM: A Lightweight Framework for Publicly Verifiable Decentralized Inference

Ke Wang; Zishuo Zhao; Xinyuan Song; Bill Shi; Libin Xia; Chris Tong; Lynn Ai; Felix Qu; Eric Yang

VeriLLM: A Lightweight Framework for Publicly Verifiable Decentralized Inference

Ke Wang, Zishuo Zhao, Xinyuan Song, Bill Shi, Libin Xia, Chris Tong, Lynn Ai, Felix Qu, Eric Yang

TL;DR

VeriLLM tackles the verifiability problem in decentralized LLM inference by merging lightweight empirical reruns with cryptographic commitments, achieving public verifiability under a one-honest-verifier assumption with about 1% verification overhead. It introduces an isomorphic inference–verification network that multiplexes inference and verification across the same GPU workers, coupled with Merkle-root logging, VRF-based sampling, and on-chain dispute resolution to deter lazy or malicious behavior. The protocol converts the typically sequential decode phase into parallelizable prefill-based verification, enabling scalable auditing without heavy proofs or trusted execution environments. Experimental results demonstrate sub-1% overhead, robust detection of attacks, and near-linear scaling with verifier count, suggesting practical deployment for trustworthy decentralized AI services with auditable provenance.

Abstract

Decentralized inference provides a scalable and resilient paradigm for serving large language models (LLMs), enabling distributed resource utilization and reducing reliance on centralized providers. However, in a permissionless environment without trusted nodes, ensuring the correctness of model outputs remains a core challenge. We introduce VeriLLM, a publicly verifiable protocol for decentralized LLM inference that achieves security under a one-honest-verifier assumption while maintaining practical efficiency. VeriLLM combines lightweight empirical rerunning with cryptographic commitments, allowing verifiers to validate results at approximately 1% of the underlying inference cost. To prevent verification bottlenecks, we design an isomorphic inference-verification architecture that multiplexes both inference and verification roles across the same GPU workers. This design (i) improves GPU utilization and overall throughput, (ii) enlarges the effective validator set, enhancing robustness and liveness, and (iii) enforces task indistinguishability to prevent node-specific optimizations or selective behavior. Through theoretical analysis and system-level evaluation, we show that VeriLLM achieves reliable public verifiability with minimal overhead, offering a practical foundation for trustworthy and scalable decentralized LLM inference.

VeriLLM: A Lightweight Framework for Publicly Verifiable Decentralized Inference

TL;DR

Abstract

VeriLLM: A Lightweight Framework for Publicly Verifiable Decentralized Inference

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (2)