Table of Contents
Fetching ...

VPAS: Publicly Verifiable and Privacy-Preserving Aggregate Statistics on Distributed Datasets

Mohammed Alghazwi, Dewi Davies-Batista, Dimka Karastoyanova, Fatih Turkmen

TL;DR

VPAS tackles the challenge of privacy-preserving aggregation with input validation and public verifiability on distributed datasets. It introduces a dual-construction framework using distributed verifiable encryption (DVE), verifiable aggregation (VA), and verifiable re-encryption (VRE) to enable arbitrary input validation and public proofs without relying on non-colluding servers, backed by a Groth16 zkSNARK and a blockchain-style ledger for auditability. The paper demonstrates practicality via a GWAS case study and reports a substantial reduction in verifiability overhead compared to conventional zkSNARK-only approaches, enabling broader applicability. The work provides a concrete pathway for auditable, privacy-preserving statistics in sensitive domains like healthcare and genomics, with a scalable protocol architecture and an emphasis on public verifiability and data provenance.

Abstract

Aggregate statistics play an important role in extracting meaningful insights from distributed data while preserving privacy. A growing number of application domains, such as healthcare, utilize these statistics in advancing research and improving patient care. In this work, we explore the challenge of input validation and public verifiability within privacy-preserving aggregation protocols. We address the scenario in which a party receives data from multiple sources and must verify the validity of the input and correctness of the computations over this data to third parties, such as auditors, while ensuring input data privacy. To achieve this, we propose the "VPAS" protocol, which satisfies these requirements. Our protocol utilizes homomorphic encryption for data privacy, and employs Zero-Knowledge Proofs (ZKP) and a blockchain system for input validation and public verifiability. We constructed VPAS by extending existing verifiable encryption schemes into secure protocols that enable N clients to encrypt, aggregate, and subsequently release the final result to a collector in a verifiable manner. We implemented and experimentally evaluated VPAS with regard to encryption costs, proof generation, and verification. The findings indicate that the overhead associated with verifiability in our protocol is 10x lower than that incurred by simply using conventional zkSNARKs. This enhanced efficiency makes it feasible to apply input validation with public verifiability across a wider range of applications or use cases that can tolerate moderate computational overhead associated with proof generation.

VPAS: Publicly Verifiable and Privacy-Preserving Aggregate Statistics on Distributed Datasets

TL;DR

VPAS tackles the challenge of privacy-preserving aggregation with input validation and public verifiability on distributed datasets. It introduces a dual-construction framework using distributed verifiable encryption (DVE), verifiable aggregation (VA), and verifiable re-encryption (VRE) to enable arbitrary input validation and public proofs without relying on non-colluding servers, backed by a Groth16 zkSNARK and a blockchain-style ledger for auditability. The paper demonstrates practicality via a GWAS case study and reports a substantial reduction in verifiability overhead compared to conventional zkSNARK-only approaches, enabling broader applicability. The work provides a concrete pathway for auditable, privacy-preserving statistics in sensitive domains like healthcare and genomics, with a scalable protocol architecture and an emphasis on public verifiability and data provenance.

Abstract

Aggregate statistics play an important role in extracting meaningful insights from distributed data while preserving privacy. A growing number of application domains, such as healthcare, utilize these statistics in advancing research and improving patient care. In this work, we explore the challenge of input validation and public verifiability within privacy-preserving aggregation protocols. We address the scenario in which a party receives data from multiple sources and must verify the validity of the input and correctness of the computations over this data to third parties, such as auditors, while ensuring input data privacy. To achieve this, we propose the "VPAS" protocol, which satisfies these requirements. Our protocol utilizes homomorphic encryption for data privacy, and employs Zero-Knowledge Proofs (ZKP) and a blockchain system for input validation and public verifiability. We constructed VPAS by extending existing verifiable encryption schemes into secure protocols that enable N clients to encrypt, aggregate, and subsequently release the final result to a collector in a verifiable manner. We implemented and experimentally evaluated VPAS with regard to encryption costs, proof generation, and verification. The findings indicate that the overhead associated with verifiability in our protocol is 10x lower than that incurred by simply using conventional zkSNARKs. This enhanced efficiency makes it feasible to apply input validation with public verifiability across a wider range of applications or use cases that can tolerate moderate computational overhead associated with proof generation.
Paper Structure (43 sections, 23 equations, 4 figures, 8 tables, 7 algorithms)

This paper contains 43 sections, 23 equations, 4 figures, 8 tables, 7 algorithms.

Figures (4)

  • Figure 1: VPAS System Overview showing the components of the system and their interactions. The clients send their encrypted data to the aggregator and proofs to the distributed ledger. The aggregator processes the input, sends the result to the collector, and submits proof to the distributed ledger. The auditor verifies the execution of the protocol by inspecting the distributed ledger.
  • Figure 2: VPAS Protocol
  • Figure 3: Benchmarking results showcasing the impact of varying parameters: (A) Number of constraints with 8 clients and an 8-bit chunk size, (B) Number of clients with $2^{10}$ constraints and an 8-bit chunk size, and (C) Message chunk sizes with $2^{10}$ constraints and 8 clients.
  • Figure 4: GWAS benchmark results with varying input.