Protecting Data Buyer Privacy in Data Markets

Minxing Zhang; Jian Pei

Protecting Data Buyer Privacy in Data Markets

Minxing Zhang, Jian Pei

TL;DR

This paper tackles the problem of protecting data buyers' privacy in data markets, an area largely neglected by prior work focused on data owners and third parties. It formalizes buyer privacy using a true intent $V_1\wedge\cdots\wedge V_n$ and a published intent $U_1\wedge\cdots\wedge U_n$, and introduces three attacker models—PI-uniform, efficiency maximization, and purchased record inference—within a $\lambda$-privacy framework. It proposes an expansion-based protection method and multiple allocation strategies to minimize disclosure while balancing purchase cost, and validates the approach with extensive experiments on real (Adult) and synthetic datasets, showing substantial privacy gains with manageable utility loss. The results provide actionable guidance on how dimensionality, true intent size, and parameter settings ($\lambda$, $\alpha$) influence privacy-utility trade-offs, supporting practical deployment in data marketplaces.

Abstract

Data markets serve as crucial platforms facilitating data discovery, exchange, sharing, and integration among data users and providers. However, the paramount concern of privacy has predominantly centered on protecting privacy of data owners and third parties, neglecting the challenges associated with protecting the privacy of data buyers. In this article, we address this gap by modeling the intricacies of data buyer privacy protection and investigating the delicate balance between privacy and purchase cost. Through comprehensive experimentation, our results yield valuable insights, shedding light on the efficacy and efficiency of our proposed approaches.

Protecting Data Buyer Privacy in Data Markets

TL;DR

and a published intent

, and introduces three attacker models—PI-uniform, efficiency maximization, and purchased record inference—within a

-privacy framework. It proposes an expansion-based protection method and multiple allocation strategies to minimize disclosure while balancing purchase cost, and validates the approach with extensive experiments on real (Adult) and synthetic datasets, showing substantial privacy gains with manageable utility loss. The results provide actionable guidance on how dimensionality, true intent size, and parameter settings (

) influence privacy-utility trade-offs, supporting practical deployment in data marketplaces.

Abstract

Paper Structure (22 sections, 22 equations, 4 figures, 4 tables)

This paper contains 22 sections, 22 equations, 4 figures, 4 tables.

Related Work
Data Buyer Privacy and Attacker Models
Published Intent-based Attack
Efficiency Maximization Attack
Purchased Record Inference Attack
Problem Formulation
PI-uniform Attack
Efficiency Maximization Attack
Purchased Record Inference Attack
Data Buyer Privacy Protection
PI-uniform Attack / Efficiency Maximization Attack
Purchased Record Inference Attack
Experimental Results
Experimental Setup
Privacy and Utility of Published Intent
...and 7 more sections

Figures (4)

Figure 1: A Running Example to Illustrate the Expansion Method
Figure 2: Effect of True Intent Size on Attacker's Confidence, Number of Records Purchased in True Intent (TI) and Published intent (PI), and Published Intent Size regarding Dimensions "Age", "Ethnicity", and "Hours per Week" for PI-uniform Attack (PI-uniform) and Efficiency Maximization Attack (EM).
Figure 3: Effect of Privacy Threshold $\lambda$.
Figure 4: Effect of Weight Parameter $\alpha$

Protecting Data Buyer Privacy in Data Markets

TL;DR

Abstract

Protecting Data Buyer Privacy in Data Markets

Authors

TL;DR

Abstract

Table of Contents

Figures (4)