Precedence-Constrained Winter Value for Effective Graph Data Valuation
Hongliang Chi, Wei Jin, Charu Aggarwal, Yao Ma
TL;DR
Graph data valuation must account for inter-node dependencies and unlabeled elements, which challenge traditional i.i.d.-based methods like Data Shapley. We introduce PC-Winter, a precedence-constrained Winter value defined on a computation-tree-based contribution structure that captures hierarchical, unilateral dependencies among graph elements. The method uses DFS-generated permissible permutations and three efficiency techniques—permutation sampling, hierarchical truncation, and local propagation—to enable streaming estimation of per-player values $\psi_p(\mathcal{P}, U)$ and subsequent node/edge valuations. Empirical results on six real-world graphs show PC-Winter consistently outperforms Data Shapley in identifying high-value nodes and edges while offering substantial computational speedups, underscoring its practical utility for graph data monetization and model-training guidance. Future work includes scaling further and extending PC-Winter to heterogeneous graphs, broadening its applicability across graph-centric domains.
Abstract
Data valuation is essential for quantifying data's worth, aiding in assessing data quality and determining fair compensation. While existing data valuation methods have proven effective in evaluating the value of Euclidean data, they face limitations when applied to the increasingly popular graph-structured data. Particularly, graph data valuation introduces unique challenges, primarily stemming from the intricate dependencies among nodes and the exponential growth in value estimation costs. To address the challenging problem of graph data valuation, we put forth an innovative solution, Precedence-Constrained Winter (PC-Winter) Value, to account for the complex graph structure. Furthermore, we develop a variety of strategies to address the computational challenges and enable efficient approximation of PC-Winter. Extensive experiments demonstrate the effectiveness of PC-Winter across diverse datasets and tasks.
