TensorCommitments: A Lightweight Verifiable Inference for Language Models

Oguzhan Baser; Elahe Sadeghi; Eric Wang; David Ribeiro Alves; Sam Kazemian; Hong Kang; Sandeep P. Chinchali; Sriram Vishwanath

TensorCommitments: A Lightweight Verifiable Inference for Language Models

Oguzhan Baser, Elahe Sadeghi, Eric Wang, David Ribeiro Alves, Sam Kazemian, Hong Kang, Sandeep P. Chinchali, Sriram Vishwanath

TL;DR

This paper introduces TensorCommitments, a tensor-native, multivariate polynomial commitment framework for verifiable language-model inference, paired with Terkle Trees to efficiently authenticate evolving tensor states across layers and dialogue turns. By binding activation tensors to a single root commitment and enabling multivariate openings, the approach achieves low prover and verifier overheads while preserving privacy and enabling targeted, tensor-shaped queries. A robustness-aware layer selection strategy further enhances resilience against tailored attacks, delivering up to 48% improvement over prior methods with competitive costs. The method scales to long conversations and large models, providing practical, cryptographically sound verifiable inference that can be integrated into real-world inference pipelines with open-source tooling.

Abstract

Most large language models (LLMs) run on external clouds: users send a prompt, pay for inference, and must trust that the remote GPU executes the LLM without any adversarial tampering. We critically ask how to achieve verifiable LLM inference, where a prover (the service) must convince a verifier (the client) that an inference was run correctly without rerunning the LLM. Existing cryptographic works are too slow at the LLM scale, while non-cryptographic ones require a strong verifier GPU. We propose TensorCommitments (TCs), a tensor-native proof-of-inference scheme. TC binds the LLM inference to a commitment, an irreversible tag that breaks under tampering, organized in our multivariate Terkle Trees. For LLaMA2, TC adds only 0.97% prover and 0.12% verifier time over inference while improving robustness to tailored LLM attacks by up to 48% over the best prior work requiring a verifier GPU.

TensorCommitments: A Lightweight Verifiable Inference for Language Models

TL;DR

Abstract

Paper Structure (32 sections, 9 theorems, 76 equations, 9 figures, 2 tables, 1 algorithm)

This paper contains 32 sections, 9 theorems, 76 equations, 9 figures, 2 tables, 1 algorithm.

Introduction
Background
Problem Statement
Method
Experiments
Conclusion
Acknowledgment
Impact Statement
Polynomial Division Algorithm
Attack Models
Interpolation Methods
Newton Interpolation
Barycentric Interpolation
Gregory Interpolation
Dynamic Programming Solution and Analysis for The Optimization Problem
...and 17 more sections

Key Result

Proposition 4.4

Consider a TT over $n$ items with arity $B$ across $m$ tensor dimensions, so each internal node arranges its $B$ children as an order-$m$ tensor with side length $d$ per axis ($B = d^m$). The depth is $L = \lceil \log_{B^m} n \rceil$. (i) A Terkle membership proof $\pi_i^{\mathrm{T}}$ for any leaf $

Figures (9)

Figure 1: The key observation behind our TensorCommitments:multivariate interpolation is faster. We plot log-runtime to interpolate a polynomial over a fixed grid of $N\!=\!2^{12}$ samples, reshaped from 1D ($2^{12}$ points) to $m$D grids ($2^\frac{12}{m}\!\times\cdots\times 2^\frac{12}{m}$) using Newton, Barycentric, and Gregory interpolation. Across all, moving from univariate ($m=1$) to bivariate ($m=2$) cuts runtime from 4.1s to 0.125s (over 30$\times$ speedup), with further reductions as dimension increases. Its time bound $\mathcal{O}\!(\binom{m+\lfloor N^{1/m}\rfloor}{m}^2)$ derived in App. \ref{['app:complexity']}, decreases sharply as the tensor dimension $m$ grows.
Figure 2: How does our verifiable inference pipeline work?Trusted setup: A secure enclave publishes structured reference string $\mathsf{srs}$$\!=\!(g^{\tau},g^{\tau^2}\!,\!\ldots)$ with the model and data. Prover: Runs model $\mathcal{M}(\mathcal{D};\theta_{\mathcal{M}})$ to produce activation tensors $T^{\mathcal{D}}_{\mathcal{M}}$; interpolates into multi- variate polynomial $f_{T_{\mathcal{M}}^{\mathcal{D}}}$; commits to obtain $C_{f}$ and builds Terkle tree $\mathcal{T}$; upon verifier challenge $\omega_i$, provides opening proofs $\pi^{\omega_i}_{C}$. Verifier: Uses spectral heavy-tail scores $\alpha_{\mathcal{M}}$ to rank layers, solves the interval selector, Problem 4.5, to choose challenge $\{\omega_i\}$, and checks pairing $e(\cdot,\cdot)$, accepting only if all checks pass. The prover does all heavy work. The verifier checks pairings and does not re-run full inference.
Figure 3: Which verifiable tree aligns best with the tensor structure while keeping the proofs succinct?(Top) A $B$-ary Merkle tree commits to leaf values $C_i$ via hash labels $H_j^{d}$; a membership proof for a single leaf (highlighted in red) must include all sibling hashes along the path from leaf to root, treating the state as a flat list of values. (Bottom) A Verkle tree replaces hash parents with vector commitments $C_j^{1}$ and per-level opening proofs $\pi_j^{d}$, reducing proof size to $O(\log_B n)$ but still indexing children in one dimension, without exploiting any tensor structure in the underlying model or feature map. (Right) A Terkle tree commits at the root $C_1^{2}$ to a tensor-shaped grid of parameters or features (illustrated as colored regions on the base plane); each internal node $C_j^{1}$ corresponds to a multi-dimensional block, and each $\pi_j^{d}$ is a multivariate opening at a specific tensor index. This tensor-native organization allows authenticating the entire LLM or multi-agent states with a single root while informing about structured subsets (e.g., spatial patches) using fewer openings.
Figure 4: Where are LLMs most vulnerable to perturbations? For each model, we inject Gaussian noise into one layer at a time, scaled by the layer’s $\ell_2$ weight norm, and measure the absolute change in the predicted output-token probability. The plots aggregate per-layer sensitivities over the first to fourth network quarters, revealing that sensitivity is highly non-uniform across depth and architectures. For example, LLaMA2-7B (brown) is most sensitive in Q2, while LLaMA2-13B (gray) peaks in Q1 and is least sensitive in Q3, yet OPT-125M (turquoise) is least affected in Q4.
Figure 5: How do Terkle trees scale better than Merkle and Verkle trees? Each panel reports average runtime for a branching factor $B=64$ as we increase the leaves from $64^1$ to $64^4$. (Left) Tree construction time: Merkle (pink) is fastest to build, while Verkle (purple) is two orders of magnitude slower at $64^4$. Terkle provides up to 29$\times$ speed-up compared to Verkle and the smoothest scaling. (Middle) Proof generation time: Terkle reduces the proving time by up to 67$\times$ compared to Verkle. Merkle is the fastest but has a 63$\times$ larger proof size without privacy guarantees as $B=64$. (Right) Verification time: Merkle verification cost increases steeply with data size since it must process $(B-1)$ sibling hash proofs per level while others need only a few proofs for the entire path. Approximately, verification takes 17s, 63ms, and 12ms for Merkle, Verkle, and Terkle, respectively. Hence, we speed them up by 1416$\times$ and 14$\times$ respectively. Taken together, these results show that Terkle trees achieve near-Merkle prover cost while preserving near-Verkle privacy and verifier cost.
...and 4 more figures

Theorems & Definitions (27)

Definition 2.1
Definition 2.2
Definition 2.3
Definition 2.4
Definition 2.5
Definition 2.6
Remark 2.7
Definition 2.8
Definition 2.9
Remark 2.10
...and 17 more

TensorCommitments: A Lightweight Verifiable Inference for Language Models

TL;DR

Abstract

TensorCommitments: A Lightweight Verifiable Inference for Language Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (27)