Table of Contents
Fetching ...

Trustless Audits without Revealing Data or Models

Suppakit Waiwitlikhit, Ion Stoica, Yi Sun, Tatsunori Hashimoto, Daniel Kang

TL;DR

Trustless audits for ML training are possible without revealing data or weights by publishing commitments and zero-knowledge proofs that training occurred on the committed data. ZkAudit-T proves SGD steps from hidden data/weights, and ZkAudit-I enables arbitrary audits by proving $F( ext{data},\text{weights})$ with a ZK-SNARK, while maintaining privacy. The approach extends ZK-SNARKs to gradient descent via rounded division and fixed-point arithmetic and introduces a high-performance softmax to achieve competitive accuracy on ImageNet-scale tasks. Empirical results show feasible proving costs and near FP32 accuracy on image classification and recommender systems, with practical audit costs for censorship, counterfactual, and copyright-related checks. The work demonstrates a viable path to privacy-preserving, trustless ML audits with broad potential impact and clear directions for scaling to larger models and language tasks.

Abstract

There is an increasing conflict between business incentives to hide models and data as trade secrets, and the societal need for algorithmic transparency. For example, a rightsholder wishing to know whether their copyrighted works have been used during training must convince the model provider to allow a third party to audit the model and data. Finding a mutually agreeable third party is difficult, and the associated costs often make this approach impractical. In this work, we show that it is possible to simultaneously allow model providers to keep their model weights (but not architecture) and data secret while allowing other parties to trustlessly audit model and data properties. We do this by designing a protocol called ZkAudit in which model providers publish cryptographic commitments of datasets and model weights, alongside a zero-knowledge proof (ZKP) certifying that published commitments are derived from training the model. Model providers can then respond to audit requests by privately computing any function F of the dataset (or model) and releasing the output of F alongside another ZKP certifying the correct execution of F. To enable ZkAudit, we develop new methods of computing ZKPs for SGD on modern neural nets for simple recommender systems and image classification models capable of high accuracies on ImageNet. Empirically, we show it is possible to provide trustless audits of DNNs, including copyright, censorship, and counterfactual audits with little to no loss in accuracy.

Trustless Audits without Revealing Data or Models

TL;DR

Trustless audits for ML training are possible without revealing data or weights by publishing commitments and zero-knowledge proofs that training occurred on the committed data. ZkAudit-T proves SGD steps from hidden data/weights, and ZkAudit-I enables arbitrary audits by proving with a ZK-SNARK, while maintaining privacy. The approach extends ZK-SNARKs to gradient descent via rounded division and fixed-point arithmetic and introduces a high-performance softmax to achieve competitive accuracy on ImageNet-scale tasks. Empirical results show feasible proving costs and near FP32 accuracy on image classification and recommender systems, with practical audit costs for censorship, counterfactual, and copyright-related checks. The work demonstrates a viable path to privacy-preserving, trustless ML audits with broad potential impact and clear directions for scaling to larger models and language tasks.

Abstract

There is an increasing conflict between business incentives to hide models and data as trade secrets, and the societal need for algorithmic transparency. For example, a rightsholder wishing to know whether their copyrighted works have been used during training must convince the model provider to allow a third party to audit the model and data. Finding a mutually agreeable third party is difficult, and the associated costs often make this approach impractical. In this work, we show that it is possible to simultaneously allow model providers to keep their model weights (but not architecture) and data secret while allowing other parties to trustlessly audit model and data properties. We do this by designing a protocol called ZkAudit in which model providers publish cryptographic commitments of datasets and model weights, alongside a zero-knowledge proof (ZKP) certifying that published commitments are derived from training the model. Model providers can then respond to audit requests by privately computing any function F of the dataset (or model) and releasing the output of F alongside another ZKP certifying the correct execution of F. To enable ZkAudit, we develop new methods of computing ZKPs for SGD on modern neural nets for simple recommender systems and image classification models capable of high accuracies on ImageNet. Empirically, we show it is possible to provide trustless audits of DNNs, including copyright, censorship, and counterfactual audits with little to no loss in accuracy.
Paper Structure (25 sections, 14 equations, 4 figures, 5 tables)

This paper contains 25 sections, 14 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Test accuracy vs cost of proving training across the entire dataset for the Pareto frontier of image classification. Higher is better. The dashed line is the fp32 accuracy.
  • Figure 2: Test MSE vs total training cost for the Pareto frontier for the recommender system. Lower is better.
  • Figure 3: Test MSE vs scale factor. ZkAudit-T achieves parity with fp32 at $2^{13}$.
  • Figure 4: Test accuracy vs scale factor. As shown, we can achieve within 0.7% accuracy compared to full precision with a scale factor of $2^{15}$. The accuracy degrades with lower scale factors.