Table of Contents
Fetching ...

Learning from Complexity: Exploring Dynamic Sample Pruning of Spatio-Temporal Training

Wei Chen, Junle Chen, Yuqian Wu, Yuxuan Liang, Xiaofang Zhou

TL;DR

ST-Prune tackles training inefficiency in spatio-temporal forecasting by dynamically pruning training samples. It introduces a complexity-informed scoring metric to identify informative samples and a stationarity-aware gradient rescaling to preserve distributional balance, coupled with an annealed training schedule. Across real-world datasets and foundation-model scales, ST-Prune yields substantial per-epoch speedups while maintaining or improving forecasting accuracy, and it demonstrates universality across backbones, optimizers, and tasks. This data-centric approach holds promise for scalable, efficient spatio-temporal learning in large-scale settings.

Abstract

Spatio-temporal forecasting is fundamental to intelligent systems in transportation, climate science, and urban planning. However, training deep learning models on the massive, often redundant, datasets from these domains presents a significant computational bottleneck. Existing solutions typically focus on optimizing model architectures or optimizers, while overlooking the inherent inefficiency of the training data itself. This conventional approach of iterating over the entire static dataset each epoch wastes considerable resources on easy-to-learn or repetitive samples. In this paper, we explore a novel training-efficiency techniques, namely learning from complexity with dynamic sample pruning, ST-Prune, for spatio-temporal forecasting. Through dynamic sample pruning, we aim to intelligently identify the most informative samples based on the model's real-time learning state, thereby accelerating convergence and improving training efficiency. Extensive experiments conducted on real-world spatio-temporal datasets show that ST-Prune significantly accelerates the training speed while maintaining or even improving the model performance, and it also has scalability and universality.

Learning from Complexity: Exploring Dynamic Sample Pruning of Spatio-Temporal Training

TL;DR

ST-Prune tackles training inefficiency in spatio-temporal forecasting by dynamically pruning training samples. It introduces a complexity-informed scoring metric to identify informative samples and a stationarity-aware gradient rescaling to preserve distributional balance, coupled with an annealed training schedule. Across real-world datasets and foundation-model scales, ST-Prune yields substantial per-epoch speedups while maintaining or improving forecasting accuracy, and it demonstrates universality across backbones, optimizers, and tasks. This data-centric approach holds promise for scalable, efficient spatio-temporal learning in large-scale settings.

Abstract

Spatio-temporal forecasting is fundamental to intelligent systems in transportation, climate science, and urban planning. However, training deep learning models on the massive, often redundant, datasets from these domains presents a significant computational bottleneck. Existing solutions typically focus on optimizing model architectures or optimizers, while overlooking the inherent inefficiency of the training data itself. This conventional approach of iterating over the entire static dataset each epoch wastes considerable resources on easy-to-learn or repetitive samples. In this paper, we explore a novel training-efficiency techniques, namely learning from complexity with dynamic sample pruning, ST-Prune, for spatio-temporal forecasting. Through dynamic sample pruning, we aim to intelligently identify the most informative samples based on the model's real-time learning state, thereby accelerating convergence and improving training efficiency. Extensive experiments conducted on real-world spatio-temporal datasets show that ST-Prune significantly accelerates the training speed while maintaining or even improving the model performance, and it also has scalability and universality.
Paper Structure (30 sections, 9 equations, 12 figures, 6 tables)

This paper contains 30 sections, 9 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: The spatio-temporal data redundancy characteristics and statistical properties along the spatial and temporal dimensions, exemplified by the PeMS08song2020spatial dataset. For more statistical information on other datasets, please refer to Appendix \ref{['appendix_datasets']}.
  • Figure 2: Further analysis of spatio-temporal data insights. (a) Averaging Masking Effect: Low-error nodes dilute critical localized anomalies, necessitating a structural scoring mechanism beyond simple mean error. (b) Long-tail Stationarity Distribution: The dominance of stationary patterns motivates our stationarity-aware rescaling to prevent distribution shift and maintain representativeness.
  • Figure 3: The overall workflow of the ST-Prune for efficient data pruning during spatio-temporal training.
  • Figure 4: The trade-off between per-epoch time and performance in UrbanEV. Specifically, we report the test performance when methods achieve per epoch times of {10%, 30%, 50%, 70%, 90%} of the full dataset training time. "Vanilla" denotes the full dataset training result.
  • Figure 5: Performance and efficiency trade-offs between w/o and w/ ST-Prune at different scales of the ST-foundation model OpenCity.
  • ...and 7 more figures