Data Augmentation techniques in time series domain: A survey and taxonomy
Guillermo Iglesias, Edgar Talavera, Ángel González-Prieto, Alberto Mozo, Sandra Gómez-Canaval
TL;DR
This survey addresses the challenge of limited, privacy-constrained time-series data by cataloging data augmentation methods across traditional transformations, VAEs, and GAN-based approaches. It organizes methods into a taxonomy, analyzes evaluation metrics and their applicability, and contrasts downstream performance with distributional similarity measures. The work highlights practical considerations, including transformation validity, training stability for generative models, and task-specific trade-offs in anomaly detection and data imputation. By clarifying when and how to apply each technique and how to assess synthetic time-series data, the paper aims to guide researchers and practitioners toward more reliable and effective augmentation strategies.
Abstract
With the latest advances in Deep Learning-based generative models, it has not taken long to take advantage of their remarkable performance in the area of time series. Deep neural networks used to work with time series heavily depend on the size and consistency of the datasets used in training. These features are not usually abundant in the real world, where they are usually limited and often have constraints that must be guaranteed. Therefore, an effective way to increase the amount of data is by using Data Augmentation techniques, either by adding noise or permutations and by generating new synthetic data. This work systematically reviews the current state-of-the-art in the area to provide an overview of all available algorithms and proposes a taxonomy of the most relevant research. The efficiency of the different variants will be evaluated as a central part of the process, as well as the different metrics to evaluate the performance and the main problems concerning each model will be analysed. The ultimate aim of this study is to provide a summary of the evolution and performance of areas that produce better results to guide future researchers in this field.
