Table of Contents
Fetching ...

SVIP: Towards Verifiable Inference of Open-source Large Language Models

Yifan Sun, Yuhang Li, Yue Zhang, Yuchen Jin, Huan Zhang

TL;DR

SVIP tackles the problem of verifiable inference for open-source LLMs in decentralized settings by requiring the computing provider to return processed hidden representations along with a proxy-task signal. A simple hidden-state protocol uses a proxy task trained exclusively on the specified model's representations to verify model usage, and a secret-based extension introduces a controllable secret to prevent direct vector optimization attacks. The approach achieves low per-query false negative and false positive rates (below 5% and 3%, respectively) with aggregation over multiple prompts, and remains computationally lightweight (verification under 0.01 seconds per prompt). Extensive experiments across 5 specified LLMs and 6 alternatives demonstrate robustness to adaptive adversaries, with a secret-update mechanism further hardening security and supporting up to 80–120 million prompts between retraining. The work offers a practical, scalable framework to foster trust in open-source LLM inference for users and platforms alike, bridging a gap between security guarantees and real-time deployment needs.

Abstract

The ever-increasing size of open-source Large Language Models (LLMs) renders local deployment impractical for individual users. Decentralized computing has emerged as a cost-effective solution, allowing individuals and small companies to perform LLM inference for users using surplus computational power. However, a computing provider may stealthily substitute the requested LLM with a smaller, less capable model without consent from users, thereby benefiting from cost savings. We introduce SVIP, a secret-based verifiable LLM inference protocol. Unlike existing solutions based on cryptographic or game-theoretic techniques, our method is computationally effective and does not rest on strong assumptions. Our protocol requires the computing provider to return both the generated text and processed hidden representations from LLMs. We then train a proxy task on these representations, effectively transforming them into a unique model identifier. With our protocol, users can reliably verify whether the computing provider is acting honestly. A carefully integrated secret mechanism further strengthens its security. We thoroughly analyze our protocol under multiple strong and adaptive adversarial scenarios. Our extensive experiments demonstrate that SVIP is accurate, generalizable, computationally efficient, and resistant to various attacks. Notably, SVIP achieves false negative rates below 5% and false positive rates below 3%, while requiring less than 0.01 seconds per prompt query for verification.

SVIP: Towards Verifiable Inference of Open-source Large Language Models

TL;DR

SVIP tackles the problem of verifiable inference for open-source LLMs in decentralized settings by requiring the computing provider to return processed hidden representations along with a proxy-task signal. A simple hidden-state protocol uses a proxy task trained exclusively on the specified model's representations to verify model usage, and a secret-based extension introduces a controllable secret to prevent direct vector optimization attacks. The approach achieves low per-query false negative and false positive rates (below 5% and 3%, respectively) with aggregation over multiple prompts, and remains computationally lightweight (verification under 0.01 seconds per prompt). Extensive experiments across 5 specified LLMs and 6 alternatives demonstrate robustness to adaptive adversaries, with a secret-update mechanism further hardening security and supporting up to 80–120 million prompts between retraining. The work offers a practical, scalable framework to foster trust in open-source LLM inference for users and platforms alike, bridging a gap between security guarantees and real-time deployment needs.

Abstract

The ever-increasing size of open-source Large Language Models (LLMs) renders local deployment impractical for individual users. Decentralized computing has emerged as a cost-effective solution, allowing individuals and small companies to perform LLM inference for users using surplus computational power. However, a computing provider may stealthily substitute the requested LLM with a smaller, less capable model without consent from users, thereby benefiting from cost savings. We introduce SVIP, a secret-based verifiable LLM inference protocol. Unlike existing solutions based on cryptographic or game-theoretic techniques, our method is computationally effective and does not rest on strong assumptions. Our protocol requires the computing provider to return both the generated text and processed hidden representations from LLMs. We then train a proxy task on these representations, effectively transforming them into a unique model identifier. With our protocol, users can reliably verify whether the computing provider is acting honestly. A carefully integrated secret mechanism further strengthens its security. We thoroughly analyze our protocol under multiple strong and adaptive adversarial scenarios. Our extensive experiments demonstrate that SVIP is accurate, generalizable, computationally efficient, and resistant to various attacks. Notably, SVIP achieves false negative rates below 5% and false positive rates below 3%, while requiring less than 0.01 seconds per prompt query for verification.

Paper Structure

This paper contains 63 sections, 2 theorems, 26 equations, 9 figures, 13 tables.

Key Result

Theorem 3.1

Suppose the protocol has per-query false negative rate $\mathrm{FNR}$ and false positive rate $\mathrm{FPR}$. Let $V_1,\dots,V_B \in \{0,1\}$ be the verification outcomes of $B$ independent queries, and let $\bar{V} = \frac{1}{B}\sum_{i=1}^B V_i .$ Consider any decision rule that declares the provid

Figures (9)

  • Figure 1: The problem setting of verifiable inference for LLMs. (a) Our protocol involves three parties. (b) A user requests the computing provider (referred to as provider in the figure) to run inference on their prompt using the Llama-3.1-70B model. Without verification, they have no way to confirm if the specified model is used. (c) Our proposed protocol solves this by requiring the provider to return processed hidden representations from the LLM, enabling the user to verify through a verification function whether the correct model was used for inference. Specifically, the hidden representations are compressed to reduce the computational overhead.
  • Figure 2: Illustration of the motivation behind our framework. The proxy task is trained solely on hidden states from the specified LLM $M_{spec}$. During deployment, strong performance on the proxy task indicates that the provider used $M_{spec}$ as specified, while poor performance suggests otherwise.
  • Figure 3: Empirical distribution of the $L_2$ distance between the predicted proxy task output $f_{\phi^{*}}(z(x) )$ and the label vector $y_{\gamma^*}(x,s)$ on the test dataset of LMSYS-Chat-1M. Each figure corresponds to a different specified model. The distributions compare the $L_2$ distances when the specified model is used versus various alternative models. The clear separation between the distributions, marked by the vertical threshold line, ensures the high accuracy of our protocol in distinguishing between correct and incorrect model usage. More examples can be found in \ref{['sec:app_unseen']}.
  • Figure 4: Attack Success Rate for the adapter attack, plotted as a function of the number of prompt samples collected under each single secret.
  • Figure 5: Illustration of (a) the simple protocol (Section \ref{['sec: simple_protocol']}); (b) secret-based protocol (Section \ref{['sec:methods_secret']}).
  • ...and 4 more figures

Theorems & Definitions (3)

  • Theorem 3.1
  • Theorem 2.1: Exponential error decay under query aggregation
  • proof