Table of Contents
Fetching ...

Online Data Augmentation for Forecasting with Deep Learning

Vitor Cerqueira, Moisés Santos, Luis Roque, Yassine Baghoussi, Carlos Soares

TL;DR

This work tackles forecasting with multiple univariate time series under data scarcity by introducing online data augmentation, generating synthetic samples on-the-fly during training in a model-agnostic framework. By pairing each batch's originals with corresponding synthetic variants, it maintains a balanced representation and avoids storing large augmented datasets. Empirical results across six datasets, three neural architectures, and seven generation methods show online augmentation often yields better forecasting accuracy than offline augmentation or no augmentation. The authors provide a public, extensible framework to reproduce and extend these findings, with potential for adaptive and multivariate extensions in future work.

Abstract

Deep learning approaches are increasingly used to tackle forecasting tasks involving datasets with multiple univariate time series. A key factor in the successful application of these methods is a large enough training sample size, which is not always available. Synthetic data generation techniques can be applied in these scenarios to augment the dataset. Data augmentation is typically applied offline before training a model. However, when training with mini-batches, some batches may contain a disproportionate number of synthetic samples that do not align well with the original data characteristics. This work introduces an online data augmentation framework that generates synthetic samples during the training of neural networks. By creating synthetic samples for each batch alongside their original counterparts, we maintain a balanced representation between real and synthetic data throughout the training process. This approach fits naturally with the iterative nature of neural network training and eliminates the need to store large augmented datasets. We validated the proposed framework using 3797 time series from 6 benchmark datasets, three neural architectures, and seven synthetic data generation techniques. The experiments suggest that online data augmentation leads to better forecasting performance compared to offline data augmentation or no augmentation approaches. The framework and experiments are publicly available.

Online Data Augmentation for Forecasting with Deep Learning

TL;DR

This work tackles forecasting with multiple univariate time series under data scarcity by introducing online data augmentation, generating synthetic samples on-the-fly during training in a model-agnostic framework. By pairing each batch's originals with corresponding synthetic variants, it maintains a balanced representation and avoids storing large augmented datasets. Empirical results across six datasets, three neural architectures, and seven generation methods show online augmentation often yields better forecasting accuracy than offline augmentation or no augmentation. The authors provide a public, extensible framework to reproduce and extend these findings, with potential for adaptive and multivariate extensions in future work.

Abstract

Deep learning approaches are increasingly used to tackle forecasting tasks involving datasets with multiple univariate time series. A key factor in the successful application of these methods is a large enough training sample size, which is not always available. Synthetic data generation techniques can be applied in these scenarios to augment the dataset. Data augmentation is typically applied offline before training a model. However, when training with mini-batches, some batches may contain a disproportionate number of synthetic samples that do not align well with the original data characteristics. This work introduces an online data augmentation framework that generates synthetic samples during the training of neural networks. By creating synthetic samples for each batch alongside their original counterparts, we maintain a balanced representation between real and synthetic data throughout the training process. This approach fits naturally with the iterative nature of neural network training and eliminates the need to store large augmented datasets. We validated the proposed framework using 3797 time series from 6 benchmark datasets, three neural architectures, and seven synthetic data generation techniques. The experiments suggest that online data augmentation leads to better forecasting performance compared to offline data augmentation or no augmentation approaches. The framework and experiments are publicly available.
Paper Structure (20 sections, 2 equations, 8 figures, 3 tables)

This paper contains 20 sections, 2 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: High-level workflow of the synthetic time series generation based on STL and MBB. The blue shaded circle represent synthetic data.
  • Figure 2: Example of the synthetic time series generated from an original one. This example focus on transformations that create a synthetic time series without reference to other series.
  • Figure 3: Typical workflow for time series data augmentation in the context of forecasting. A set of synthetic time series is created and concatenated with the original data, and the combined dataset is used to train a forecasting model.
  • Figure 4: High-level workflow of the training process of a neural network using the proposed framework for online data augmentation.
  • Figure 5: Traditional time series data partitioning process into training, validation, and test sets for developing forecasting models.
  • ...and 3 more figures