Table of Contents
Fetching ...

ShapeCond: Fast Shapelet-Guided Dataset Condensation for Time Series Classification

Sijia Peng, Yun Xiong, Xi Chen, Yi Xie, Guanzhi Li, Yanwei Yu, Yangyong Zhu, Zhiqiang Shen

TL;DR

ShapeCond tackles the growth of time series data by introducing a shapelet-guided dataset condensation framework that preserves both local discriminative motifs and global temporal structure. It jointly optimizes a compact synthesized set via a dual-view process: global dynamics guided by a frozen teacher encoder and local motif constraints enforced through shapelet transforms, with BatchNorm statistics matched to the full data and soft teacher labels used for supervision. The approach achieves up to 29× speedups in synthesis, outperforms all prior time-series condensation methods on seven datasets, and enables effective downstream tasks such as neural architecture search with significantly reduced data. This work demonstrates that explicitly preserving shapelet knowledge in condensed data yields substantial accuracy gains while markedly reducing storage and computation, offering a scalable strategy for temporal data modeling in resource-constrained settings.

Abstract

Time series data supports many domains (e.g., finance and climate science), but its rapid growth strains storage and computation. Dataset condensation can alleviate this by synthesizing a compact training set that preserves key information. Yet most condensation methods are image-centric and often fail on time series because they miss time-series-specific temporal structure, especially local discriminative motifs such as shapelets. In this work, we propose ShapeCond, a novel and efficient condensation framework for time series classification that leverages shapelet-based dataset knowledge via a shapelet-guided optimization strategy. Our shapelet-assisted synthesis cost is independent of sequence length: longer series yield larger speedups in synthesis (e.g., 29$\times$ faster over prior state-of-the-art method CondTSC for time-series condensation, and up to 10,000$\times$ over naively using shapelets on the Sleep dataset with 3,000 timesteps). By explicitly preserving critical local patterns, ShapeCond improves downstream accuracy and consistently outperforms all prior state-of-the-art time series dataset condensation methods across extensive experiments. Code is available at https://github.com/lunaaa95/ShapeCond.

ShapeCond: Fast Shapelet-Guided Dataset Condensation for Time Series Classification

TL;DR

ShapeCond tackles the growth of time series data by introducing a shapelet-guided dataset condensation framework that preserves both local discriminative motifs and global temporal structure. It jointly optimizes a compact synthesized set via a dual-view process: global dynamics guided by a frozen teacher encoder and local motif constraints enforced through shapelet transforms, with BatchNorm statistics matched to the full data and soft teacher labels used for supervision. The approach achieves up to 29× speedups in synthesis, outperforms all prior time-series condensation methods on seven datasets, and enables effective downstream tasks such as neural architecture search with significantly reduced data. This work demonstrates that explicitly preserving shapelet knowledge in condensed data yields substantial accuracy gains while markedly reducing storage and computation, offering a scalable strategy for temporal data modeling in resource-constrained settings.

Abstract

Time series data supports many domains (e.g., finance and climate science), but its rapid growth strains storage and computation. Dataset condensation can alleviate this by synthesizing a compact training set that preserves key information. Yet most condensation methods are image-centric and often fail on time series because they miss time-series-specific temporal structure, especially local discriminative motifs such as shapelets. In this work, we propose ShapeCond, a novel and efficient condensation framework for time series classification that leverages shapelet-based dataset knowledge via a shapelet-guided optimization strategy. Our shapelet-assisted synthesis cost is independent of sequence length: longer series yield larger speedups in synthesis (e.g., 29 faster over prior state-of-the-art method CondTSC for time-series condensation, and up to 10,000 over naively using shapelets on the Sleep dataset with 3,000 timesteps). By explicitly preserving critical local patterns, ShapeCond improves downstream accuracy and consistently outperforms all prior state-of-the-art time series dataset condensation methods across extensive experiments. Code is available at https://github.com/lunaaa95/ShapeCond.
Paper Structure (32 sections, 28 equations, 5 figures, 10 tables)

This paper contains 32 sections, 28 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: Conceptual illustration of dataset condensation. The aim is to synthesize a small yet informative dataset from a large dataset, such that models trained on the condensed dataset can achieve performance comparable to those trained on full dataset. This approach facilitates faster training and reduces storage requirements.
  • Figure 2: Shapelets are the most representative segments within classes. Thus, the classification of a time series with an unknown label can be determined by comparing its Euclidean distance to shapelet 1 (for class 1) with that to shapelet 2 (for class 2).
  • Figure 3: Shapelet-guided Data Synthesis Stage. While model encoder gradients control global temporal structure (lower box), shapelet-guided optimization preserves critical local patterns (upper box).
  • Figure 4: Fast Shapelet Discovery. We reduce the computational cost by combining candidate pruning with position-constrained distance search. Distance computation is restricted to a local temporal neighborhood, including the candidate’s original position (red double arrows) and nearby temporal positions (black double arrows), yielding constant-time ($O(1)$) distance evaluations.
  • Figure 5: High-ratio pruning accelerates shapelet discovery while preserving dataset information. (a) Comparison of shapelet discovery processes with and without high-ratio pruning, showing significant acceleration. (b) Dataset information remains intact. This is evaluated via classification accuracy of models trained on the pruned dataset. Even with aggressive pruning (up to 70%), accuracy remains comparable to the original dataset, indicating that essential information is preserved.