Table of Contents
Fetching ...

Verifiable Provenance of Software Artifacts with Zero-Knowledge Compilation

Javier Ron, Martin Monperrus

TL;DR

This work tackles software provenance by ensuring a cryptographic link between source code, compiler, and produced binary. It introduces verifiable compilation inside a zero-knowledge VM (zkVM), producing a succinct receipt that proves the binary was generated by the claimed compiler from the claimed source. A proof-of-concept called CosmicTurtle uses the RISC Zero zkVM and the ChibiCC C compiler, and is evaluated on 200 synthetic programs plus 52 real-world cryptographic files, showing strong provenance guarantees even in adversarial scenarios. While zk-proving adds substantial one-time overhead, verification remains fast and scalable, enabling widespread provenance checks without re-running builds or trusting hardware TEEs. The work demonstrates practical feasibility and offers a hardware-agnostic, cryptographically grounded approach to secure software supply chains.

Abstract

Verifying that a compiled binary originates from its claimed source code is a fundamental security requirement, called source code provenance. Achieving verifiable source code provenance in practice remains challenging. The most popular technique, called reproducible builds, requires difficult matching and reexecution of build toolchains and environments. We propose a novel approach to verifiable provenance based on compiling software with zero-knowledge virtual machines (zkVMs). By executing a compiler within a zkVM, our system produces both the compiled output and a cryptographic proof attesting that the compilation was performed on the claimed source code with the claimed compiler. We implement a proof-of-concept implementation using the RISC Zero zkVM and the ChibiCC C compiler, and evaluate it on 200 synthetic programs as well as 31 OpenSSL and 21 libsodium source files. Our results show that zk-compilation is applicable to real-world software and provides strong security guarantees: all adversarial tests targeting compiler substitution, source tampering, output manipulation, and replay attacks are successfully blocked.

Verifiable Provenance of Software Artifacts with Zero-Knowledge Compilation

TL;DR

This work tackles software provenance by ensuring a cryptographic link between source code, compiler, and produced binary. It introduces verifiable compilation inside a zero-knowledge VM (zkVM), producing a succinct receipt that proves the binary was generated by the claimed compiler from the claimed source. A proof-of-concept called CosmicTurtle uses the RISC Zero zkVM and the ChibiCC C compiler, and is evaluated on 200 synthetic programs plus 52 real-world cryptographic files, showing strong provenance guarantees even in adversarial scenarios. While zk-proving adds substantial one-time overhead, verification remains fast and scalable, enabling widespread provenance checks without re-running builds or trusting hardware TEEs. The work demonstrates practical feasibility and offers a hardware-agnostic, cryptographically grounded approach to secure software supply chains.

Abstract

Verifying that a compiled binary originates from its claimed source code is a fundamental security requirement, called source code provenance. Achieving verifiable source code provenance in practice remains challenging. The most popular technique, called reproducible builds, requires difficult matching and reexecution of build toolchains and environments. We propose a novel approach to verifiable provenance based on compiling software with zero-knowledge virtual machines (zkVMs). By executing a compiler within a zkVM, our system produces both the compiled output and a cryptographic proof attesting that the compilation was performed on the claimed source code with the claimed compiler. We implement a proof-of-concept implementation using the RISC Zero zkVM and the ChibiCC C compiler, and evaluate it on 200 synthetic programs as well as 31 OpenSSL and 21 libsodium source files. Our results show that zk-compilation is applicable to real-world software and provides strong security guarantees: all adversarial tests targeting compiler substitution, source tampering, output manipulation, and replay attacks are successfully blocked.
Paper Structure (29 sections, 4 figures, 1 table)

This paper contains 29 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Provenance guarantees in CosmicTurtle. The compiler is executed inside RISC Zero's zkVM. It records the computation trace while compiling the source program. It then generates a succinct cryptographic proof attesting that the compilation occurred on the claimed source code with the claimed compiler, i.e. cryptographically verifiable provenance.
  • Figure 2: System overview showing the three phases of verifiable compilation: (1) compiler handshake where prover and verifier agree on the compiler binary and its cryptographic identity, (2) compilation with proof where the prover executes the compiler within a zkVM to generate both the binary output and a cryptographic proof, and (3) verification where the verifier validates the proof against the committed software artifacts, i.e. validates that the binary is correctly obtained from the claimed source code and compiler. Once verified, the verifier has integrity guarantees to run the program.
  • Figure 3: Threat model for verifiable compilation. The prover generates compilation artifacts while the verifier validates them. Four threats target the main artifacts: (1) source code tampering before or during compilation, (2) compiler substitution with a malicious binary, (3) binary code tampering after compilation, and (4) replay attacks reusing valid proofs with different artifacts.
  • Figure 4: Experimental results across randomly generated programs from Csmith and real-world programs from libsodium and OpenSSL (C file sizes in KB, receipt sizes in MB). (a) Compilation time vs program size shows linear scaling. (b) Receipt size vs program size demonstrates reasonable storage requirements for the generated proofs. (c) Compilation time vs verification time shows verification is consistently faster than proof generation.