Review of Data-centric Time Series Analysis from Sample, Feature, and Period

Chenxi Sun; Hongyan Li; Yaliang Li; Shenda Hong

Review of Data-centric Time Series Analysis from Sample, Feature, and Period

Chenxi Sun, Hongyan Li, Yaliang Li, Shenda Hong

TL;DR

This paper addresses the underexplored data-centric view of time series analysis, arguing that data quality and sample selection critically influence TS tasks beyond model design, e.g., the TS sample $x=\{x^{d}_{i}\}_{i=1,d=1}^{T,D}$. It provides a taxonomy by sample, feature, and period, and surveys data filtering, augmentation, learning-order arrangements, feature augmentation, dimension reduction, and period-related choices. The contributions include a structured synthesis of methods, evaluation of trade-offs, and recommendations for open problems and future directions. The work aims to guide dataset construction, data management, and data-centric AI deployment for TS, with implications for TS-LMs and domain-specific TS models.

Abstract

Data is essential to performing time series analysis utilizing machine learning approaches, whether for classic models or today's large language models. A good time-series dataset is advantageous for the model's accuracy, robustness, and convergence, as well as task outcomes and costs. The emergence of data-centric AI represents a shift in the landscape from model refinement to prioritizing data quality. Even though time-series data processing methods frequently come up in a wide range of research fields, it hasn't been well investigated as a specific topic. To fill the gap, in this paper, we systematically review different data-centric methods in time series analysis, covering a wide range of research topics. Based on the time-series data characteristics at sample, feature, and period, we propose a taxonomy for the reviewed data selection methods. In addition to discussing and summarizing their characteristics, benefits, and drawbacks targeting time-series data, we also introduce the challenges and opportunities by proposing recommendations, open problems, and possible research topics.

Review of Data-centric Time Series Analysis from Sample, Feature, and Period

TL;DR

This paper addresses the underexplored data-centric view of time series analysis, arguing that data quality and sample selection critically influence TS tasks beyond model design, e.g., the TS sample

. It provides a taxonomy by sample, feature, and period, and surveys data filtering, augmentation, learning-order arrangements, feature augmentation, dimension reduction, and period-related choices. The contributions include a structured synthesis of methods, evaluation of trade-offs, and recommendations for open problems and future directions. The work aims to guide dataset construction, data management, and data-centric AI deployment for TS, with implications for TS-LMs and domain-specific TS models.

Abstract

Paper Structure (16 sections, 1 figure, 3 tables)

This paper contains 16 sections, 1 figure, 3 tables.

Introduction
Time-Series Data Characteristics
Sample Selection
Data Filtering
Data Augmentation
Learning Order Arrangement
Feature Selection
Feature Augmentation
Dimension Reduction
Period Selection
Window Size Setting
Subsequence Extraction
Discussion
Open Problems
Potential Topics
...and 1 more sections

Figures (1)

Figure 1: Data-Centric Time Series Analysis: A case of time series selection for data quantity reduction from the perspectives of sample (blue), feature (green), and period (yellow).

Theorems & Definitions (3)

Definition 1: Sample
Definition 2: Feature
Definition 3: Period

Review of Data-centric Time Series Analysis from Sample, Feature, and Period

TL;DR

Abstract

Review of Data-centric Time Series Analysis from Sample, Feature, and Period

Authors

TL;DR

Abstract

Table of Contents

Figures (1)

Theorems & Definitions (3)