TensorCommitments: A Lightweight Verifiable Inference for Language Models
Oguzhan Baser, Elahe Sadeghi, Eric Wang, David Ribeiro Alves, Sam Kazemian, Hong Kang, Sandeep P. Chinchali, Sriram Vishwanath
TL;DR
This paper introduces TensorCommitments, a tensor-native, multivariate polynomial commitment framework for verifiable language-model inference, paired with Terkle Trees to efficiently authenticate evolving tensor states across layers and dialogue turns. By binding activation tensors to a single root commitment and enabling multivariate openings, the approach achieves low prover and verifier overheads while preserving privacy and enabling targeted, tensor-shaped queries. A robustness-aware layer selection strategy further enhances resilience against tailored attacks, delivering up to 48% improvement over prior methods with competitive costs. The method scales to long conversations and large models, providing practical, cryptographically sound verifiable inference that can be integrated into real-world inference pipelines with open-source tooling.
Abstract
Most large language models (LLMs) run on external clouds: users send a prompt, pay for inference, and must trust that the remote GPU executes the LLM without any adversarial tampering. We critically ask how to achieve verifiable LLM inference, where a prover (the service) must convince a verifier (the client) that an inference was run correctly without rerunning the LLM. Existing cryptographic works are too slow at the LLM scale, while non-cryptographic ones require a strong verifier GPU. We propose TensorCommitments (TCs), a tensor-native proof-of-inference scheme. TC binds the LLM inference to a commitment, an irreversible tag that breaks under tampering, organized in our multivariate Terkle Trees. For LLaMA2, TC adds only 0.97% prover and 0.12% verifier time over inference while improving robustness to tailored LLM attacks by up to 48% over the best prior work requiring a verifier GPU.
