Engineering Trustworthy Machine-Learning Operations with Zero-Knowledge Proofs
Filippo Scaramuzza, Giovanni Quattrocchi, Damian A. Tamburri
TL;DR
The paper tackles the challenge of verifying trustworthy AI within MLOps under regulatory pressures by leveraging Zero-Knowledge Proofs to provide tamper-proof, privacy-preserving evidence of correct computations. It systematically surveys ZKP protocols and analyzes ZKP-enhanced ML across the TDSP lifecycle, identifying five key ZKP properties and a convergence toward a unified ZKMLOps framework. The authors find that current work concentrates on inference verification while data preprocessing and training stages are understudied, highlighting a path to end-to-end, verifiable ML pipelines that satisfy accountability and regulatory demands. This work lays a foundation for practical, auditable AI systems by outlining a roadmap for ZKMLOps, including emphasis on post-quantum readiness, federated learning integration, and toolchains to guide practitioners in selecting suitable ZKP techniques.
Abstract
As Artificial Intelligence (AI) systems, particularly those based on machine learning (ML), become integral to high-stakes applications, their probabilistic and opaque nature poses significant challenges to traditional verification and validation methods. These challenges are exacerbated in regulated sectors requiring tamper-proof, auditable evidence, as highlighted by apposite legal frameworks, e.g., the EU AI Act. Conversely, Zero-Knowledge Proofs (ZKPs) offer a cryptographic solution that enables provers to demonstrate, through verified computations, adherence to set requirements without revealing sensitive model details or data. Through a systematic survey of ZKP protocols, we identify five key properties (non-interactivity, transparent setup, standard representations, succinctness, and post-quantum security) critical for their application in AI validation and verification pipelines. Subsequently, we perform a follow-up systematic survey analyzing ZKP-enhanced ML applications across an adaptation of the Team Data Science Process (TDSP) model (Data & Preprocessing, Training & Offline Metrics, Inference, and Online Metrics), detailing verification objectives, ML models, and adopted protocols. Our findings indicate that current research on ZKP-Enhanced ML primarily focuses on inference verification, while the data preprocessing and training stages remain underexplored. Most notably, our analysis identifies a significant convergence within the research domain toward the development of a unified Zero-Knowledge Machine Learning Operations (ZKMLOps) framework. This emerging framework leverages ZKPs to provide robust cryptographic guarantees of correctness, integrity, and privacy, thereby promoting enhanced accountability, transparency, and compliance with Trustworthy AI principles.
