Table of Contents
Fetching ...

TensorCommitments: A Lightweight Verifiable Inference for Language Models

Oguzhan Baser, Elahe Sadeghi, Eric Wang, David Ribeiro Alves, Sam Kazemian, Hong Kang, Sandeep P. Chinchali, Sriram Vishwanath

TL;DR

This paper introduces TensorCommitments, a tensor-native, multivariate polynomial commitment framework for verifiable language-model inference, paired with Terkle Trees to efficiently authenticate evolving tensor states across layers and dialogue turns. By binding activation tensors to a single root commitment and enabling multivariate openings, the approach achieves low prover and verifier overheads while preserving privacy and enabling targeted, tensor-shaped queries. A robustness-aware layer selection strategy further enhances resilience against tailored attacks, delivering up to 48% improvement over prior methods with competitive costs. The method scales to long conversations and large models, providing practical, cryptographically sound verifiable inference that can be integrated into real-world inference pipelines with open-source tooling.

Abstract

Most large language models (LLMs) run on external clouds: users send a prompt, pay for inference, and must trust that the remote GPU executes the LLM without any adversarial tampering. We critically ask how to achieve verifiable LLM inference, where a prover (the service) must convince a verifier (the client) that an inference was run correctly without rerunning the LLM. Existing cryptographic works are too slow at the LLM scale, while non-cryptographic ones require a strong verifier GPU. We propose TensorCommitments (TCs), a tensor-native proof-of-inference scheme. TC binds the LLM inference to a commitment, an irreversible tag that breaks under tampering, organized in our multivariate Terkle Trees. For LLaMA2, TC adds only 0.97% prover and 0.12% verifier time over inference while improving robustness to tailored LLM attacks by up to 48% over the best prior work requiring a verifier GPU.

TensorCommitments: A Lightweight Verifiable Inference for Language Models

TL;DR

This paper introduces TensorCommitments, a tensor-native, multivariate polynomial commitment framework for verifiable language-model inference, paired with Terkle Trees to efficiently authenticate evolving tensor states across layers and dialogue turns. By binding activation tensors to a single root commitment and enabling multivariate openings, the approach achieves low prover and verifier overheads while preserving privacy and enabling targeted, tensor-shaped queries. A robustness-aware layer selection strategy further enhances resilience against tailored attacks, delivering up to 48% improvement over prior methods with competitive costs. The method scales to long conversations and large models, providing practical, cryptographically sound verifiable inference that can be integrated into real-world inference pipelines with open-source tooling.

Abstract

Most large language models (LLMs) run on external clouds: users send a prompt, pay for inference, and must trust that the remote GPU executes the LLM without any adversarial tampering. We critically ask how to achieve verifiable LLM inference, where a prover (the service) must convince a verifier (the client) that an inference was run correctly without rerunning the LLM. Existing cryptographic works are too slow at the LLM scale, while non-cryptographic ones require a strong verifier GPU. We propose TensorCommitments (TCs), a tensor-native proof-of-inference scheme. TC binds the LLM inference to a commitment, an irreversible tag that breaks under tampering, organized in our multivariate Terkle Trees. For LLaMA2, TC adds only 0.97% prover and 0.12% verifier time over inference while improving robustness to tailored LLM attacks by up to 48% over the best prior work requiring a verifier GPU.
Paper Structure (32 sections, 9 theorems, 76 equations, 9 figures, 2 tables, 1 algorithm)

This paper contains 32 sections, 9 theorems, 76 equations, 9 figures, 2 tables, 1 algorithm.

Key Result

Proposition 4.4

Consider a TT over $n$ items with arity $B$ across $m$ tensor dimensions, so each internal node arranges its $B$ children as an order-$m$ tensor with side length $d$ per axis ($B = d^m$). The depth is $L = \lceil \log_{B^m} n \rceil$. (i) A Terkle membership proof $\pi_i^{\mathrm{T}}$ for any leaf $

Figures (9)

  • Figure 1: The key observation behind our TensorCommitments:multivariate interpolation is faster. We plot log-runtime to interpolate a polynomial over a fixed grid of $N\!=\!2^{12}$ samples, reshaped from 1D ($2^{12}$ points) to $m$D grids ($2^\frac{12}{m}\!\times\cdots\times 2^\frac{12}{m}$) using Newton, Barycentric, and Gregory interpolation. Across all, moving from univariate ($m=1$) to bivariate ($m=2$) cuts runtime from 4.1s to 0.125s (over 30$\times$ speedup), with further reductions as dimension increases. Its time bound $\mathcal{O}\!(\binom{m+\lfloor N^{1/m}\rfloor}{m}^2)$ derived in App. \ref{['app:complexity']}, decreases sharply as the tensor dimension $m$ grows.
  • Figure 2: How does our verifiable inference pipeline work?Trusted setup: A secure enclave publishes structured reference string $\mathsf{srs}$$\!=\!(g^{\tau},g^{\tau^2}\!,\!\ldots)$ with the model and data. Prover: Runs model $\mathcal{M}(\mathcal{D};\theta_{\mathcal{M}})$ to produce activation tensors $T^{\mathcal{D}}_{\mathcal{M}}$; interpolates into multi- variate polynomial $f_{T_{\mathcal{M}}^{\mathcal{D}}}$; commits to obtain $C_{f}$ and builds Terkle tree $\mathcal{T}$; upon verifier challenge $\omega_i$, provides opening proofs $\pi^{\omega_i}_{C}$. Verifier: Uses spectral heavy-tail scores $\alpha_{\mathcal{M}}$ to rank layers, solves the interval selector, Problem 4.5, to choose challenge $\{\omega_i\}$, and checks pairing $e(\cdot,\cdot)$, accepting only if all checks pass. The prover does all heavy work. The verifier checks pairings and does not re-run full inference.
  • Figure 3: Which verifiable tree aligns best with the tensor structure while keeping the proofs succinct?(Top) A $B$-ary Merkle tree commits to leaf values $C_i$ via hash labels $H_j^{d}$; a membership proof for a single leaf (highlighted in red) must include all sibling hashes along the path from leaf to root, treating the state as a flat list of values. (Bottom) A Verkle tree replaces hash parents with vector commitments $C_j^{1}$ and per-level opening proofs $\pi_j^{d}$, reducing proof size to $O(\log_B n)$ but still indexing children in one dimension, without exploiting any tensor structure in the underlying model or feature map. (Right) A Terkle tree commits at the root $C_1^{2}$ to a tensor-shaped grid of parameters or features (illustrated as colored regions on the base plane); each internal node $C_j^{1}$ corresponds to a multi-dimensional block, and each $\pi_j^{d}$ is a multivariate opening at a specific tensor index. This tensor-native organization allows authenticating the entire LLM or multi-agent states with a single root while informing about structured subsets (e.g., spatial patches) using fewer openings.
  • Figure 4: Where are LLMs most vulnerable to perturbations? For each model, we inject Gaussian noise into one layer at a time, scaled by the layer’s $\ell_2$ weight norm, and measure the absolute change in the predicted output-token probability. The plots aggregate per-layer sensitivities over the first to fourth network quarters, revealing that sensitivity is highly non-uniform across depth and architectures. For example, LLaMA2-7B (brown) is most sensitive in Q2, while LLaMA2-13B (gray) peaks in Q1 and is least sensitive in Q3, yet OPT-125M (turquoise) is least affected in Q4.
  • Figure 5: How do Terkle trees scale better than Merkle and Verkle trees? Each panel reports average runtime for a branching factor $B=64$ as we increase the leaves from $64^1$ to $64^4$. (Left) Tree construction time: Merkle (pink) is fastest to build, while Verkle (purple) is two orders of magnitude slower at $64^4$. Terkle provides up to 29$\times$ speed-up compared to Verkle and the smoothest scaling. (Middle) Proof generation time: Terkle reduces the proving time by up to 67$\times$ compared to Verkle. Merkle is the fastest but has a 63$\times$ larger proof size without privacy guarantees as $B=64$. (Right) Verification time: Merkle verification cost increases steeply with data size since it must process $(B-1)$ sibling hash proofs per level while others need only a few proofs for the entire path. Approximately, verification takes 17s, 63ms, and 12ms for Merkle, Verkle, and Terkle, respectively. Hence, we speed them up by 1416$\times$ and 14$\times$ respectively. Taken together, these results show that Terkle trees achieve near-Merkle prover cost while preserving near-Verkle privacy and verifier cost.
  • ...and 4 more figures

Theorems & Definitions (27)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Definition 2.4
  • Definition 2.5
  • Definition 2.6
  • Remark 2.7
  • Definition 2.8
  • Definition 2.9
  • Remark 2.10
  • ...and 17 more