Table of Contents
Fetching ...

Privacy-Preserving UCB Decision Process Verification via zk-SNARKs

Xikun Jiang, He Lyu, Chenhao Ying, Yibin Xu, Boris Düdder, Yuan Luo

TL;DR

The paper tackles privacy and verifiability in reinforcement learning for Multi-Armed Bandit problems by integrating zk-SNARKs with the UCB algorithm to shield sensitive data and parameters while enabling external verification. It introduces zkUCB, which encodes the entire UCB process as a deterministic arithmetic circuit and uses Groth16 zk-SNARKs, employing seeding for randomness, polynomial approximations for nonpolynomial operations, and quantization to fit finite-field arithmetic. Key contributions include a new framework combining zk-SNARKs with MAB, a theoretical mapping of UCB to arithmetic circuits, and empirical evidence that appropriate quantization can boost rewards while keeping proof sizes linear in execution steps, enabling scalable, privacy-preserving decision verification. The work holds significant practical implications for privacy-sensitive domains such as healthcare AI, where third-party verification is needed without disclosing private data or model parameters. Overall, zkUCB advances privacy-preserving verifiable reinforcement learning and demonstrates favorable scalability characteristics for real-world deployment.

Abstract

With the increasingly widespread application of machine learning, how to strike a balance between protecting the privacy of data and algorithm parameters and ensuring the verifiability of machine learning has always been a challenge. This study explores the intersection of reinforcement learning and data privacy, specifically addressing the Multi-Armed Bandit (MAB) problem with the Upper Confidence Bound (UCB) algorithm. We introduce zkUCB, an innovative algorithm that employs the Zero-Knowledge Succinct Non-Interactive Argument of Knowledge (zk-SNARKs) to enhance UCB. zkUCB is carefully designed to safeguard the confidentiality of training data and algorithmic parameters, ensuring transparent UCB decision-making. Experiments highlight zkUCB's superior performance, attributing its enhanced reward to judicious quantization bit usage that reduces information entropy in the decision-making process. zkUCB's proof size and verification time scale linearly with the execution steps of zkUCB. This showcases zkUCB's adept balance between data security and operational efficiency. This approach contributes significantly to the ongoing discourse on reinforcing data privacy in complex decision-making processes, offering a promising solution for privacy-sensitive applications.

Privacy-Preserving UCB Decision Process Verification via zk-SNARKs

TL;DR

The paper tackles privacy and verifiability in reinforcement learning for Multi-Armed Bandit problems by integrating zk-SNARKs with the UCB algorithm to shield sensitive data and parameters while enabling external verification. It introduces zkUCB, which encodes the entire UCB process as a deterministic arithmetic circuit and uses Groth16 zk-SNARKs, employing seeding for randomness, polynomial approximations for nonpolynomial operations, and quantization to fit finite-field arithmetic. Key contributions include a new framework combining zk-SNARKs with MAB, a theoretical mapping of UCB to arithmetic circuits, and empirical evidence that appropriate quantization can boost rewards while keeping proof sizes linear in execution steps, enabling scalable, privacy-preserving decision verification. The work holds significant practical implications for privacy-sensitive domains such as healthcare AI, where third-party verification is needed without disclosing private data or model parameters. Overall, zkUCB advances privacy-preserving verifiable reinforcement learning and demonstrates favorable scalability characteristics for real-world deployment.

Abstract

With the increasingly widespread application of machine learning, how to strike a balance between protecting the privacy of data and algorithm parameters and ensuring the verifiability of machine learning has always been a challenge. This study explores the intersection of reinforcement learning and data privacy, specifically addressing the Multi-Armed Bandit (MAB) problem with the Upper Confidence Bound (UCB) algorithm. We introduce zkUCB, an innovative algorithm that employs the Zero-Knowledge Succinct Non-Interactive Argument of Knowledge (zk-SNARKs) to enhance UCB. zkUCB is carefully designed to safeguard the confidentiality of training data and algorithmic parameters, ensuring transparent UCB decision-making. Experiments highlight zkUCB's superior performance, attributing its enhanced reward to judicious quantization bit usage that reduces information entropy in the decision-making process. zkUCB's proof size and verification time scale linearly with the execution steps of zkUCB. This showcases zkUCB's adept balance between data security and operational efficiency. This approach contributes significantly to the ongoing discourse on reinforcing data privacy in complex decision-making processes, offering a promising solution for privacy-sensitive applications.
Paper Structure (27 sections, 1 theorem, 3 equations, 6 figures, 2 algorithms)

This paper contains 27 sections, 1 theorem, 3 equations, 6 figures, 2 algorithms.

Key Result

Theorem 1

The above zkUCB scheme satisfies a non-interactive zero-knowledge argument of knowledge with completeness and perfect zero-knowledge. It has computational knowledge soundness against a Probabilistic Polynomial-Time adversary.

Figures (6)

  • Figure 1: An illustration of zkUCB. We integrate zk-SNARK with the UBC algorithm. In zkUCB, the complete reinforcement learning process is encapsulated as a statement, which is used to model a deterministic arithmetic circuit that uses addition gates and multiplication gates for arithmetic calculations. The arithmetic circuit then serves as the first step in zk-SNARK execution.
  • Figure 2: Application of zk-SNARKs in the AI Doctor System Using UCB Algorithm.
  • Figure 3: Average reward generated during each round by UCB and zkUCB over 100 iterations
  • Figure 4: The average time during various phases
  • Figure 5: The average time for verifying proof
  • ...and 1 more figures

Theorems & Definitions (2)

  • Theorem 1
  • proof