Data Pricing for Graph Neural Networks without Pre-purchased Inspection
Yiping Liu, Mengxiao Zhang, Jiamou Liu, Song Yang
TL;DR
This work tackles data pricing in graph-based model marketplaces when data cannot be inspected prior to payment. It introduces SIMT, a two-phase mechanism that first ranks data by a structural-importance score derived from structural entropy and PageRank, then procures data via cluster-wise auctions under budget constraints, and finally trains a GNN using feature propagation and edge augmentation with a contrastive objective. The method guarantees incentive compatibility, individual rationality, and budget feasibility, and demonstrates up to a 40% improvement in MacroF1/MicroF1 over baselines across five datasets. The approach advances practical data pricing for graph-structured data by leveraging graph topology to infer data value without disclosing raw attributes or labels, enabling effective model training in decentralized data settings.
Abstract
Machine learning (ML) models have become essential tools in various scenarios. Their effectiveness, however, hinges on a substantial volume of data for satisfactory performance. Model marketplaces have thus emerged as crucial platforms bridging model consumers seeking ML solutions and data owners possessing valuable data. These marketplaces leverage model trading mechanisms to properly incentive data owners to contribute their data, and return a well performing ML model to the model consumers. However, existing model trading mechanisms often assume the data owners are willing to share their data before being paid, which is not reasonable in real world. Given that, we propose a novel mechanism, named Structural Importance based Model Trading (SIMT) mechanism, that assesses the data importance and compensates data owners accordingly without disclosing the data. Specifically, SIMT procures feature and label data from data owners according to their structural importance, and then trains a graph neural network for model consumers. Theoretically, SIMT ensures incentive compatible, individual rational and budget feasible. The experiments on five popular datasets validate that SIMT consistently outperforms vanilla baselines by up to $40\%$ in both MacroF1 and MicroF1.
