Measuring the Hidden Cost of Data Valuation through Collective Disclosure

Patrick Mesana; Gilles Caporossi; Sebastien Gambs

Measuring the Hidden Cost of Data Valuation through Collective Disclosure

Patrick Mesana, Gilles Caporossi, Sebastien Gambs

TL;DR

The paper addresses the hidden cost of data valuation by modeling a Data Union (DU) that coordinates collective disclosure under differential privacy to regulate value distribution. It introduces the Information Disclosure Game (IDG), a Stackelberg framework where the DU sets iterative, DP-enabled disclosure policies and the Data Consumer (DC) acquires data to meet a utility target, revealing an explicit acquisition cost. Through Yelp-based experiments using $k$-NN and SBERT embeddings, the authors show that valuation inherently entails exploration costs, with Shapley-based and bandit strategies each capable of achieving target utility under budget constraints. The findings highlight the need for minimum dividend guarantees to ensure inclusivity and suggest future work on extending valuation to differentiable models and gradient-based Shapley approximations to enhance scalability and privacy–utility trade-offs.

Abstract

Data valuation methods assign marginal utility to each data point that has contributed to the training of a machine learning model. If used directly as a payout mechanism, this creates a hidden cost of valuation, in which contributors with near-zero marginal value would receive nothing, even though their data had to be collected and assessed. To better formalize this cost, we introduce a conceptual and game-theoretic model, the Information Disclosure Game, between a Data Union (sometimes also called a data trust), a member-run agent representing contributors, and a Data Consumer (e.g., a platform). After first aggregating members' data, the DU releases information progressively by adding Laplacian noise under a differentially-private mechanism. Through simulations with strategies guided by data Shapley values and multi-armed bandit exploration, we demonstrate on a Yelp review helpfulness prediction task that data valuation inherently incurs an explicit acquisition cost and that the DU's collective disclosure policy changes how this cost is distributed across members.

Measuring the Hidden Cost of Data Valuation through Collective Disclosure

TL;DR

Abstract

Measuring the Hidden Cost of Data Valuation through Collective Disclosure

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)