Table of Contents
Fetching ...

Dataset Condensation for Time Series Classification via Dual Domain Matching

Zhanyu Liu, Ke Hao, Guanjie Zheng, Yanwei Yu

TL;DR

This paper proposes a novel framework named Dataset Condensation for Time Series Classification via Dual Domain Matching (CondTSC) which focuses on the time series classification dataset condensation task and aims to generate a condensed dataset that matches the surrogate objectives in both the time and frequency domains.

Abstract

Time series data has been demonstrated to be crucial in various research fields. The management of large quantities of time series data presents challenges in terms of deep learning tasks, particularly for training a deep neural network. Recently, a technique named \textit{Dataset Condensation} has emerged as a solution to this problem. This technique generates a smaller synthetic dataset that has comparable performance to the full real dataset in downstream tasks such as classification. However, previous methods are primarily designed for image and graph datasets, and directly adapting them to the time series dataset leads to suboptimal performance due to their inability to effectively leverage the rich information inherent in time series data, particularly in the frequency domain. In this paper, we propose a novel framework named Dataset \textit{\textbf{Cond}}ensation for \textit{\textbf{T}}ime \textit{\textbf{S}}eries \textit{\textbf{C}}lassification via Dual Domain Matching (\textbf{CondTSC}) which focuses on the time series classification dataset condensation task. Different from previous methods, our proposed framework aims to generate a condensed dataset that matches the surrogate objectives in both the time and frequency domains. Specifically, CondTSC incorporates multi-view data augmentation, dual domain training, and dual surrogate objectives to enhance the dataset condensation process in the time and frequency domains. Through extensive experiments, we demonstrate the effectiveness of our proposed framework, which outperforms other baselines and learns a condensed synthetic dataset that exhibits desirable characteristics such as conforming to the distribution of the original data.

Dataset Condensation for Time Series Classification via Dual Domain Matching

TL;DR

This paper proposes a novel framework named Dataset Condensation for Time Series Classification via Dual Domain Matching (CondTSC) which focuses on the time series classification dataset condensation task and aims to generate a condensed dataset that matches the surrogate objectives in both the time and frequency domains.

Abstract

Time series data has been demonstrated to be crucial in various research fields. The management of large quantities of time series data presents challenges in terms of deep learning tasks, particularly for training a deep neural network. Recently, a technique named \textit{Dataset Condensation} has emerged as a solution to this problem. This technique generates a smaller synthetic dataset that has comparable performance to the full real dataset in downstream tasks such as classification. However, previous methods are primarily designed for image and graph datasets, and directly adapting them to the time series dataset leads to suboptimal performance due to their inability to effectively leverage the rich information inherent in time series data, particularly in the frequency domain. In this paper, we propose a novel framework named Dataset \textit{\textbf{Cond}}ensation for \textit{\textbf{T}}ime \textit{\textbf{S}}eries \textit{\textbf{C}}lassification via Dual Domain Matching (\textbf{CondTSC}) which focuses on the time series classification dataset condensation task. Different from previous methods, our proposed framework aims to generate a condensed dataset that matches the surrogate objectives in both the time and frequency domains. Specifically, CondTSC incorporates multi-view data augmentation, dual domain training, and dual surrogate objectives to enhance the dataset condensation process in the time and frequency domains. Through extensive experiments, we demonstrate the effectiveness of our proposed framework, which outperforms other baselines and learns a condensed synthetic dataset that exhibits desirable characteristics such as conforming to the distribution of the original data.
Paper Structure (27 sections, 19 equations, 7 figures, 6 tables, 1 algorithm)

This paper contains 27 sections, 19 equations, 7 figures, 6 tables, 1 algorithm.

Figures (7)

  • Figure 1: The diagram illustrates the concept of time series data condensation, which aims to learn a small dataset that achieves comparable performance to the full dataset.
  • Figure 2: The diagram of CondTSC. LPF indicates low pass filter. FTPP indicates Fourier transform phase perturbation and FTMP indicates Fourier transform magnitude perturbation.
  • Figure 3: The TSNE visualization of the Insect dataset in both the time domain and the frequency domain. This demonstrates the intuition to do augmentation and utilize the dual-domain information for the time series data. The data in the frequency domain shows better decision boundaries and the data augmentations squeeze the boundaries between different classes.
  • Figure 4: The accuracy(%) with larger condensed data size.
  • Figure 5: the frequency domain and time domain visualization on Insect dataset of the learning process of synthetic data trained by CondTSC and MTT separately. We could observe that the synthetic data trained by CondTSC conforms to the distribution of the real data and consequently achieves remarkable performance.
  • ...and 2 more figures