Data Augmentation for Multivariate Time Series Classification: An Experimental Study
Romain Ilbert, Thai V. Hoang, Zonghua Zhang
TL;DR
The paper tackles data scarcity in multivariate time series classification by introducing a comprehensive taxonomy of augmentation techniques and evaluating their impact on ROCKET and InceptionTime across 13 imbalanced UCR/UEA datasets. It demonstrates that data augmentation can improve accuracy on many datasets, though no single technique universally dominates, underscoring the value of diverse, potentially pipeline-based augmentation strategies. The study provides a practical framework for applying augmentation to time series and highlights directions for future research, including synergy among methods and domain-adaptive pipelines. Overall, the work advances understanding of how to leverage augmentation to improve robustness and generalization in time series classification under limited data.
Abstract
Our study investigates the impact of data augmentation on the performance of multivariate time series models, focusing on datasets from the UCR archive. Despite the limited size of these datasets, we achieved classification accuracy improvements in 10 out of 13 datasets using the Rocket and InceptionTime models. This highlights the essential role of sufficient data in training effective models, paralleling the advancements seen in computer vision. Our work delves into adapting and applying existing methods in innovative ways to the domain of multivariate time series classification. Our comprehensive exploration of these techniques sets a new standard for addressing data scarcity in time series analysis, emphasizing that diverse augmentation strategies are crucial for unlocking the potential of both traditional and deep learning models. Moreover, by meticulously analyzing and applying a variety of augmentation techniques, we demonstrate that strategic data enrichment can enhance model accuracy. This not only establishes a benchmark for future research in time series analysis but also underscores the importance of adopting varied augmentation approaches to improve model performance in the face of limited data availability.
