Verifiable Provenance of Software Artifacts with Zero-Knowledge Compilation
Javier Ron, Martin Monperrus
TL;DR
This work tackles software provenance by ensuring a cryptographic link between source code, compiler, and produced binary. It introduces verifiable compilation inside a zero-knowledge VM (zkVM), producing a succinct receipt that proves the binary was generated by the claimed compiler from the claimed source. A proof-of-concept called CosmicTurtle uses the RISC Zero zkVM and the ChibiCC C compiler, and is evaluated on 200 synthetic programs plus 52 real-world cryptographic files, showing strong provenance guarantees even in adversarial scenarios. While zk-proving adds substantial one-time overhead, verification remains fast and scalable, enabling widespread provenance checks without re-running builds or trusting hardware TEEs. The work demonstrates practical feasibility and offers a hardware-agnostic, cryptographically grounded approach to secure software supply chains.
Abstract
Verifying that a compiled binary originates from its claimed source code is a fundamental security requirement, called source code provenance. Achieving verifiable source code provenance in practice remains challenging. The most popular technique, called reproducible builds, requires difficult matching and reexecution of build toolchains and environments. We propose a novel approach to verifiable provenance based on compiling software with zero-knowledge virtual machines (zkVMs). By executing a compiler within a zkVM, our system produces both the compiled output and a cryptographic proof attesting that the compilation was performed on the claimed source code with the claimed compiler. We implement a proof-of-concept implementation using the RISC Zero zkVM and the ChibiCC C compiler, and evaluate it on 200 synthetic programs as well as 31 OpenSSL and 21 libsodium source files. Our results show that zk-compilation is applicable to real-world software and provides strong security guarantees: all adversarial tests targeting compiler substitution, source tampering, output manipulation, and replay attacks are successfully blocked.
