Table of Contents
Fetching ...

Data Pricing for Graph Neural Networks without Pre-purchased Inspection

Yiping Liu, Mengxiao Zhang, Jiamou Liu, Song Yang

TL;DR

This work tackles data pricing in graph-based model marketplaces when data cannot be inspected prior to payment. It introduces SIMT, a two-phase mechanism that first ranks data by a structural-importance score derived from structural entropy and PageRank, then procures data via cluster-wise auctions under budget constraints, and finally trains a GNN using feature propagation and edge augmentation with a contrastive objective. The method guarantees incentive compatibility, individual rationality, and budget feasibility, and demonstrates up to a 40% improvement in MacroF1/MicroF1 over baselines across five datasets. The approach advances practical data pricing for graph-structured data by leveraging graph topology to infer data value without disclosing raw attributes or labels, enabling effective model training in decentralized data settings.

Abstract

Machine learning (ML) models have become essential tools in various scenarios. Their effectiveness, however, hinges on a substantial volume of data for satisfactory performance. Model marketplaces have thus emerged as crucial platforms bridging model consumers seeking ML solutions and data owners possessing valuable data. These marketplaces leverage model trading mechanisms to properly incentive data owners to contribute their data, and return a well performing ML model to the model consumers. However, existing model trading mechanisms often assume the data owners are willing to share their data before being paid, which is not reasonable in real world. Given that, we propose a novel mechanism, named Structural Importance based Model Trading (SIMT) mechanism, that assesses the data importance and compensates data owners accordingly without disclosing the data. Specifically, SIMT procures feature and label data from data owners according to their structural importance, and then trains a graph neural network for model consumers. Theoretically, SIMT ensures incentive compatible, individual rational and budget feasible. The experiments on five popular datasets validate that SIMT consistently outperforms vanilla baselines by up to $40\%$ in both MacroF1 and MicroF1.

Data Pricing for Graph Neural Networks without Pre-purchased Inspection

TL;DR

This work tackles data pricing in graph-based model marketplaces when data cannot be inspected prior to payment. It introduces SIMT, a two-phase mechanism that first ranks data by a structural-importance score derived from structural entropy and PageRank, then procures data via cluster-wise auctions under budget constraints, and finally trains a GNN using feature propagation and edge augmentation with a contrastive objective. The method guarantees incentive compatibility, individual rationality, and budget feasibility, and demonstrates up to a 40% improvement in MacroF1/MicroF1 over baselines across five datasets. The approach advances practical data pricing for graph-structured data by leveraging graph topology to infer data value without disclosing raw attributes or labels, enabling effective model training in decentralized data settings.

Abstract

Machine learning (ML) models have become essential tools in various scenarios. Their effectiveness, however, hinges on a substantial volume of data for satisfactory performance. Model marketplaces have thus emerged as crucial platforms bridging model consumers seeking ML solutions and data owners possessing valuable data. These marketplaces leverage model trading mechanisms to properly incentive data owners to contribute their data, and return a well performing ML model to the model consumers. However, existing model trading mechanisms often assume the data owners are willing to share their data before being paid, which is not reasonable in real world. Given that, we propose a novel mechanism, named Structural Importance based Model Trading (SIMT) mechanism, that assesses the data importance and compensates data owners accordingly without disclosing the data. Specifically, SIMT procures feature and label data from data owners according to their structural importance, and then trains a graph neural network for model consumers. Theoretically, SIMT ensures incentive compatible, individual rational and budget feasible. The experiments on five popular datasets validate that SIMT consistently outperforms vanilla baselines by up to in both MacroF1 and MicroF1.

Paper Structure

This paper contains 31 sections, 1 theorem, 9 equations, 2 figures, 13 tables, 1 algorithm.

Key Result

theorem 1

The SIMT mechanism is incentive compatible, individual rational and budget feasible.

Figures (2)

  • Figure 1: The framework of structural importance-based model trading (SIMT) mechanism.
  • Figure 2: Proportion of intra-class and inter-class edges

Theorems & Definitions (3)

  • definition 1
  • definition 2
  • theorem 1