Neural Dynamic Data Valuation: A Stochastic Optimal Control Approach
Zhangyong Liang, Ji Zhang, Xin Wang, Pengfei Zhang, Zhao Li
TL;DR
NDDV reframes data valuation as a stochastic optimal control problem to capture the dynamic, time-evolving utility of data during training. It introduces a forward–backward stochastic differential equation framework to model data-state trajectories and a terminal utility linked to the co-state, enabling a one-pass valuation without retraining. A fairness-aware mean-field reweighting mechanism adapts sample influence to reduce bias across heterogeneous data groups, with meta-learned weights controlling terminal costs. An interpretable valuation function based on Kolmogorov–Arnold Networks with Matérn kernels exposes how data contributions evolve across layers and epochs, improving auditability. Experiments on six datasets show substantial runtime gains (up to 58×) and improved robustness to corrupted data and fairness metrics, highlighting NDDV’s scalability and transparency for large-scale data marketplaces and learning systems.
Abstract
Data valuation has become a cornerstone of the modern data economy, where datasets function as tradable intellectual assets that drive decision-making, model training, and market transactions. Despite substantial progress, existing valuation methods remain limited by high computational cost, weak fairness guarantees, and poor interpretability, which hinder their deployment in large-scale, high-stakes applications. This paper introduces Neural Dynamic Data Valuation (NDDV), a new framework that formulates data valuation as a stochastic optimal control problem to capture the dynamic evolution of data utility over time. Unlike static combinatorial approaches, NDDV models data interactions through continuous trajectories that reflect both individual and collective learning dynamics.
