Review of Data-centric Time Series Analysis from Sample, Feature, and Period
Chenxi Sun, Hongyan Li, Yaliang Li, Shenda Hong
TL;DR
This paper addresses the underexplored data-centric view of time series analysis, arguing that data quality and sample selection critically influence TS tasks beyond model design, e.g., the TS sample $x=\{x^{d}_{i}\}_{i=1,d=1}^{T,D}$. It provides a taxonomy by sample, feature, and period, and surveys data filtering, augmentation, learning-order arrangements, feature augmentation, dimension reduction, and period-related choices. The contributions include a structured synthesis of methods, evaluation of trade-offs, and recommendations for open problems and future directions. The work aims to guide dataset construction, data management, and data-centric AI deployment for TS, with implications for TS-LMs and domain-specific TS models.
Abstract
Data is essential to performing time series analysis utilizing machine learning approaches, whether for classic models or today's large language models. A good time-series dataset is advantageous for the model's accuracy, robustness, and convergence, as well as task outcomes and costs. The emergence of data-centric AI represents a shift in the landscape from model refinement to prioritizing data quality. Even though time-series data processing methods frequently come up in a wide range of research fields, it hasn't been well investigated as a specific topic. To fill the gap, in this paper, we systematically review different data-centric methods in time series analysis, covering a wide range of research topics. Based on the time-series data characteristics at sample, feature, and period, we propose a taxonomy for the reviewed data selection methods. In addition to discussing and summarizing their characteristics, benefits, and drawbacks targeting time-series data, we also introduce the challenges and opportunities by proposing recommendations, open problems, and possible research topics.
