Table of Contents
Fetching ...

Gradient-Based Data Valuation Improves Curriculum Learning for Game-Theoretic Motion Planning

Shihao Li, Jiachen Li, Dongmei Chen

Abstract

We demonstrate that gradient-based data valuation produces curriculum orderings that significantly outperform metadata-based heuristics for training game-theoretic motion planners. Specifically, we apply TracIn gradient-similarity scoring to GameFormer on the nuPlan benchmark and construct a curriculum that weights training scenarios by their estimated contribution to validation loss reduction. Across three random seeds, the TracIn-weighted curriculum achieves a mean planning ADE of $1.704\pm0.029$\,m, significantly outperforming the metadata-based interaction-difficulty curriculum ($1.822\pm0.014$\,m; paired $t$-test $p=0.021$, Cohen's $d_z=3.88$) while exhibiting lower variance than the uniform baseline ($1.772\pm0.134$\,m). Our analysis reveals that TracIn scores and scenario metadata are nearly orthogonal (Spearman $ρ=-0.014$), indicating that gradient-based valuation captures training dynamics invisible to hand-crafted features. We further show that gradient-based curriculum weighting succeeds where hard data selection fails: TracIn-curated 20\% subsets degrade performance by $2\times$, whereas full-data curriculum weighting with the same scores yields the best results. These findings establish gradient-based data valuation as a practical tool for improving sample efficiency in game-theoretic planning.

Gradient-Based Data Valuation Improves Curriculum Learning for Game-Theoretic Motion Planning

Abstract

We demonstrate that gradient-based data valuation produces curriculum orderings that significantly outperform metadata-based heuristics for training game-theoretic motion planners. Specifically, we apply TracIn gradient-similarity scoring to GameFormer on the nuPlan benchmark and construct a curriculum that weights training scenarios by their estimated contribution to validation loss reduction. Across three random seeds, the TracIn-weighted curriculum achieves a mean planning ADE of \,m, significantly outperforming the metadata-based interaction-difficulty curriculum (\,m; paired -test , Cohen's ) while exhibiting lower variance than the uniform baseline (\,m). Our analysis reveals that TracIn scores and scenario metadata are nearly orthogonal (Spearman ), indicating that gradient-based valuation captures training dynamics invisible to hand-crafted features. We further show that gradient-based curriculum weighting succeeds where hard data selection fails: TracIn-curated 20\% subsets degrade performance by , whereas full-data curriculum weighting with the same scores yields the best results. These findings establish gradient-based data valuation as a practical tool for improving sample efficiency in game-theoretic planning.

Paper Structure

This paper contains 21 sections, 10 equations, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: Method overview. Phase 1: Three scoring methods assign per-sample importance. Phase 2: A three-phase curriculum converts scores to weights; weighted training preserves coverage while hard selection fails. Phase 3: GameFormer trained with gradient-based curriculum achieves lower ADE ($p{=}0.021$) and reduced variance.
  • Figure 2: Score correlation analysis. Left: Spearman rank correlation heatmap showing near-orthogonality of TracIn and metadata scores ($\rho=-0.014$). Right: Scatter plot of TracIn vs. metadata scores for each training scenario, colored by scoring tier. The two scoring methods capture fundamentally different aspects of data utility.
  • Figure 3: Curriculum weight schedule. (a) Weight trajectories over training epochs for samples at different score levels. All samples start at $w=1$ during warm-up (epochs 1--3), ramp up during epochs 3--8, and stabilize at maximum differentiation. (b) Distribution of final weights ($e=20$) across all 5,148 training scenarios, showing smooth differentiation centered around $w \approx 2$.
  • Figure 4: Representative scenarios from four quadrants of the TracIn $\times$ metadata scoring space. Top-left: low metadata but high TracIn---few nearby agents, yet the ego executes a turn that strongly aligns with validation gradient. Bottom-right: high metadata but low TracIn---many close agents in parallel lanes, yet the gradient opposes validation improvement. The off-diagonal quadrants illustrate the orthogonality of the two scoring methods ($\rho=-0.014$). Blue/red lines: ego past/future; gray/orange: agent past/future.
  • Figure 5: Multi-seed planning ADE comparison ($n{=}3$ seeds per method). Horizontal bars: mean; whiskers: $\pm1$ s.d.; shaped markers: individual seeds. Y-axis is broken to accommodate the Loss SPL outlier (seed 2024, ADE$=2.555$). The TracIn curriculum achieves the lowest mean ADE with the tightest seed clustering. $^*p{=}0.021$ (paired $t$-test, TracIn vs. Metadata).
  • ...and 2 more figures