IBMA: An Imputation-Based Mixup Augmentation Using Self-Supervised Learning for Time Series Data
Dang Nha Nguyen, Hai Dang Nguyen, Khoa Tho Anh Nguyen
TL;DR
This work tackles the limited augmentation strategies in long sequence time-series forecasting by introducing Imputation-based Mixup Augmentation (IMA), a two-phase framework built on Self-Supervised Reconstruction (SSR) and imputation-based augmentation. By combining imputed data with Mixup, the method creates diverse yet structurally coherent training samples, guiding forecasting models toward better generalization. Evaluations across DLinear, TimesNet, and iTransformer on ETTh1/ETTh2/ETTm1/ETTm2 show consistent improvements, achieving 22 out of 24 gains and 10 best-case results, with notable benefits from the imputation-based approach on the ETT datasets. The approach demonstrates robustness across architectures and datasets, suggesting a promising path for more effective augmentation in time-series forecasting, while also indicating model- and dataset-specific nuances.
Abstract
Data augmentation in time series forecasting plays a crucial role in enhancing model performance by introducing variability while maintaining the underlying temporal patterns. However, time series data offers fewer augmentation strategies compared to fields such as image or text, with advanced techniques like Mixup rarely being used. In this work, we propose a novel approach, Imputation-Based Mixup Augmentation (IBMA), which combines Imputation-Augmented data with Mixup augmentation to bolster model generalization and improve forecasting performance. We evaluate the effectiveness of this method across several forecasting models, including DLinear (MLP), TimesNet (CNN), and iTrainformer (Transformer), these models represent some of the most recent advances in time series forecasting. Our experiments, conducted on four datasets (ETTh1, ETTh2, ETTm1, ETTm2) and compared against eight other augmentation techniques, demonstrate that IBMA consistently enhances performance, achieving 22 improvements out of 24 instances, with 10 of those being the best performances, particularly with iTrainformer imputation.
