Table of Contents
Fetching ...

Disentangled Structural and Featural Representation for Task-Agnostic Graph Valuation

Ali Falahati, Mohammad Mohammadi Amiri

TL;DR

This work addresses graph data valuation in data marketplaces without task-specific validation by introducing a double-blind, task-agnostic framework (BMP) that aligns buyer and seller graphs using a proxy graph and graph matching. It defines structural disparity via a graph Wasserstein distance and featural diversity and relevance through covariance-based statistics, enabling a unified valuation $S$, along with $D$ and $R$. The method preserves privacy while facilitating cross-graph comparison and data exchange, and it demonstrates empirical effectiveness across diverse graph domains. The findings suggest these metrics can guide dataset ranking and selection in practice, with potential impact on scalable, privacy-conscious data marketplaces for graph-structured data.

Abstract

With the emergence of data marketplaces, the demand for methods to assess the value of data has increased significantly. While numerous techniques have been proposed for this purpose, none have specifically addressed graphs as the main data modality. Graphs are widely used across various fields, ranging from chemical molecules to social networks. In this study, we break down graphs into two main components: structural and featural, and we focus on evaluating data without relying on specific task-related metrics, making it applicable in practical scenarios where validation requirements may be lacking. We introduce a novel framework called blind message passing, which aligns the seller's and buyer's graphs using a shared node permutation based on graph matching. This allows us to utilize the graph Wasserstein distance to quantify the differences in the structural distribution of graph datasets, called the structural disparities. We then consider featural aspects of buyers' and sellers' graphs for data valuation and capture their statistical similarities and differences, referred to as relevance and diversity, respectively. Our approach ensures that buyers and sellers remain unaware of each other's datasets. Our experiments on real datasets demonstrate the effectiveness of our approach in capturing the relevance, diversity, and structural disparities of seller data for buyers, particularly in graph-based data valuation scenarios.

Disentangled Structural and Featural Representation for Task-Agnostic Graph Valuation

TL;DR

This work addresses graph data valuation in data marketplaces without task-specific validation by introducing a double-blind, task-agnostic framework (BMP) that aligns buyer and seller graphs using a proxy graph and graph matching. It defines structural disparity via a graph Wasserstein distance and featural diversity and relevance through covariance-based statistics, enabling a unified valuation , along with and . The method preserves privacy while facilitating cross-graph comparison and data exchange, and it demonstrates empirical effectiveness across diverse graph domains. The findings suggest these metrics can guide dataset ranking and selection in practice, with potential impact on scalable, privacy-conscious data marketplaces for graph-structured data.

Abstract

With the emergence of data marketplaces, the demand for methods to assess the value of data has increased significantly. While numerous techniques have been proposed for this purpose, none have specifically addressed graphs as the main data modality. Graphs are widely used across various fields, ranging from chemical molecules to social networks. In this study, we break down graphs into two main components: structural and featural, and we focus on evaluating data without relying on specific task-related metrics, making it applicable in practical scenarios where validation requirements may be lacking. We introduce a novel framework called blind message passing, which aligns the seller's and buyer's graphs using a shared node permutation based on graph matching. This allows us to utilize the graph Wasserstein distance to quantify the differences in the structural distribution of graph datasets, called the structural disparities. We then consider featural aspects of buyers' and sellers' graphs for data valuation and capture their statistical similarities and differences, referred to as relevance and diversity, respectively. Our approach ensures that buyers and sellers remain unaware of each other's datasets. Our experiments on real datasets demonstrate the effectiveness of our approach in capturing the relevance, diversity, and structural disparities of seller data for buyers, particularly in graph-based data valuation scenarios.
Paper Structure (17 sections, 21 equations, 4 figures, 4 tables, 3 algorithms)

This paper contains 17 sections, 21 equations, 4 figures, 4 tables, 3 algorithms.

Figures (4)

  • Figure 1: The BMP framework for task-agnostic graph data valuation involves three steps: (Left) A trusted broker generates a random proxy graph and shares it with the buyer and seller, who then compute optimal permutations and embeddings. The buyer performs eigendecomposition on the covariance of her feature matrix to find eigenvalues and eigenvectors. (Middle) The buyer and seller send their embeddings to the broker, who computes the structural disparity $S$. (Right) The buyer and seller share their eigenvalues with the broker, who computes relevance $R$ and diversity $D$.
  • Figure 2: Node classification accuracy of datasets prodivded in Table \ref{['tab:datasets']} via subset selection using the BMP framework.
  • Figure 3: Estimation of diversity and relevance for FRANKENSTEIN dataset (left) and MNIST dataset (right)
  • Figure 4: Pairwise score of datasets from five fields of molecules, bioinformatics, computer vision, social media, and synthetic datasets.