Table of Contents
Fetching ...

Neural Dynamic Data Valuation: A Stochastic Optimal Control Approach

Zhangyong Liang, Ji Zhang, Xin Wang, Pengfei Zhang, Zhao Li

TL;DR

NDDV reframes data valuation as a stochastic optimal control problem to capture the dynamic, time-evolving utility of data during training. It introduces a forward–backward stochastic differential equation framework to model data-state trajectories and a terminal utility linked to the co-state, enabling a one-pass valuation without retraining. A fairness-aware mean-field reweighting mechanism adapts sample influence to reduce bias across heterogeneous data groups, with meta-learned weights controlling terminal costs. An interpretable valuation function based on Kolmogorov–Arnold Networks with Matérn kernels exposes how data contributions evolve across layers and epochs, improving auditability. Experiments on six datasets show substantial runtime gains (up to 58×) and improved robustness to corrupted data and fairness metrics, highlighting NDDV’s scalability and transparency for large-scale data marketplaces and learning systems.

Abstract

Data valuation has become a cornerstone of the modern data economy, where datasets function as tradable intellectual assets that drive decision-making, model training, and market transactions. Despite substantial progress, existing valuation methods remain limited by high computational cost, weak fairness guarantees, and poor interpretability, which hinder their deployment in large-scale, high-stakes applications. This paper introduces Neural Dynamic Data Valuation (NDDV), a new framework that formulates data valuation as a stochastic optimal control problem to capture the dynamic evolution of data utility over time. Unlike static combinatorial approaches, NDDV models data interactions through continuous trajectories that reflect both individual and collective learning dynamics.

Neural Dynamic Data Valuation: A Stochastic Optimal Control Approach

TL;DR

NDDV reframes data valuation as a stochastic optimal control problem to capture the dynamic, time-evolving utility of data during training. It introduces a forward–backward stochastic differential equation framework to model data-state trajectories and a terminal utility linked to the co-state, enabling a one-pass valuation without retraining. A fairness-aware mean-field reweighting mechanism adapts sample influence to reduce bias across heterogeneous data groups, with meta-learned weights controlling terminal costs. An interpretable valuation function based on Kolmogorov–Arnold Networks with Matérn kernels exposes how data contributions evolve across layers and epochs, improving auditability. Experiments on six datasets show substantial runtime gains (up to 58×) and improved robustness to corrupted data and fairness metrics, highlighting NDDV’s scalability and transparency for large-scale data marketplaces and learning systems.

Abstract

Data valuation has become a cornerstone of the modern data economy, where datasets function as tradable intellectual assets that drive decision-making, model training, and market transactions. Despite substantial progress, existing valuation methods remain limited by high computational cost, weak fairness guarantees, and poor interpretability, which hinder their deployment in large-scale, high-stakes applications. This paper introduces Neural Dynamic Data Valuation (NDDV), a new framework that formulates data valuation as a stochastic optimal control problem to capture the dynamic evolution of data utility over time. Unlike static combinatorial approaches, NDDV models data interactions through continuous trajectories that reflect both individual and collective learning dynamics.
Paper Structure (21 sections, 1 theorem, 25 equations, 11 figures, 5 tables, 1 algorithm)

This paper contains 21 sections, 1 theorem, 25 equations, 11 figures, 5 tables, 1 algorithm.

Key Result

Lemma 1

Let $\mathcal{V}(\cdot;\theta)$ be Lipschitz-continuous with range $[0,1]$, and let dynamics follow Eq. eqn:dx. Then, for any subgroup $G_i$, where $\Delta_{\mathcal{V}}=\max_{(x,y)}\mathcal{V}(\Phi(x,y);\theta)-\min_{(x,y)}\mathcal{V}(\Phi(x,y);\theta)$ and $C$ depends on the Lipschitz constant of $\Phi$.

Figures (11)

  • Figure 1: Neural dynamic data valuation schematic and results. The panel compares NDDV and existing methods. It is evident that NDDV transforms the static combined evaluation method of existing data valuation into a dynamic optimization process, defining a new utility function and dynamic marginal contribution. Compared to existing methods, NDDV requires only one training session to determine the value of all data points, significantly enhancing computational efficiency. Taking the half-moon dataset as an example, we demonstrate some results of NDDV to indicate its effectiveness.
  • Figure 2: Learning data stochastic dynamic schematic.a. In stochastic optimal control, data points get their optimal state trajectories via dynamic interactions with the mean-field state. b. Within the data re-weighting strategy, data points are characterized by heterogeneity. In this scenario, data points dynamically interact with the weighted mean-field state, thereby determining their optimal state trajectories.
  • Figure 3: Data value trajectories at the layer-wise level.
  • Figure 4: Data value trajectories at the epoch-wise level.
  • Figure 5: Revealing the data valuation process. Using the 2dplanes dataset as an example, we assessed the impact on test accuracy via the removal and addition of high/low-value trajectories.
  • ...and 6 more figures

Theorems & Definitions (5)

  • Definition 1: Leave-One-Out (LOO) Metric
  • Definition 2: Static Marginal Contribution
  • Definition 3: Shapley Value
  • Lemma 1: Bounded Fairness Violation under Meta-Reweighting
  • proof : Proof sketch