Table of Contents
Fetching ...

Extreme value forecasting using relevance-based data augmentation with deep learning models

Junru Hua, Rahul Ahluwalia, Rohitash Chandra

TL;DR

This paper tackles the challenge of forecasting extreme values in time series by introducing a relevance-based data augmentation framework that pairs SMOTE-based resampling with deep learning models (Conv-LSTM, BD-LSTM) and GAN-based augmentation to enable accurate multistep-ahead predictions. It defines a general relevance function to identify extremes, compares SMOTE-R, SMOTE-bin, and GAN approaches, and evaluates performance using tail-focused metrics like SER in addition to RMSE across five diverse datasets. The findings indicate that SMOTE-based methods, particularly SMOTE-bin for periodic data and SMOTE-regular for volatile sequences, generally outperform GAN-based augmentation, with Conv-LSTM and BD-LSTM offering complementary strengths across data regimes. The work highlights the importance of context-aware model-augmentation alignment and suggests future directions in ensembles, adaptive resampling, and uncertainty quantification to enhance robustness in extreme-value forecasting.

Abstract

Data augmentation with generative adversarial networks (GANs) has been popular for class imbalance problems, mainly for pattern classification and computer vision-related applications. Extreme value forecasting is a challenging field that has various applications from finance to climate change problems. In this study, we present a data augmentation framework for extreme value forecasting. In this framework, our focus is on forecasting extreme values using deep learning models in combination with data augmentation models such as GANs and synthetic minority oversampling technique (SMOTE). We use deep learning models such as convolutional long short-term memory (Conv-LSTM) and bidirectional long short-term memory (BD-LSTM) networks for multistep ahead prediction featuring extremes. We investigate which data augmentation models are the most suitable, taking into account the prediction accuracy overall and at extreme regions, along with computational efficiency. We also present novel strategies for incorporating data augmentation, considering extreme values based on a relevance function. Our results indicate that the SMOTE-based strategy consistently demonstrated superior adaptability, leading to improved performance across both short- and long-horizon forecasts. Conv-LSTM and BD-LSTM exhibit complementary strengths: the former excels in periodic, stable datasets, while the latter performs better in chaotic or non-stationary sequences.

Extreme value forecasting using relevance-based data augmentation with deep learning models

TL;DR

This paper tackles the challenge of forecasting extreme values in time series by introducing a relevance-based data augmentation framework that pairs SMOTE-based resampling with deep learning models (Conv-LSTM, BD-LSTM) and GAN-based augmentation to enable accurate multistep-ahead predictions. It defines a general relevance function to identify extremes, compares SMOTE-R, SMOTE-bin, and GAN approaches, and evaluates performance using tail-focused metrics like SER in addition to RMSE across five diverse datasets. The findings indicate that SMOTE-based methods, particularly SMOTE-bin for periodic data and SMOTE-regular for volatile sequences, generally outperform GAN-based augmentation, with Conv-LSTM and BD-LSTM offering complementary strengths across data regimes. The work highlights the importance of context-aware model-augmentation alignment and suggests future directions in ensembles, adaptive resampling, and uncertainty quantification to enhance robustness in extreme-value forecasting.

Abstract

Data augmentation with generative adversarial networks (GANs) has been popular for class imbalance problems, mainly for pattern classification and computer vision-related applications. Extreme value forecasting is a challenging field that has various applications from finance to climate change problems. In this study, we present a data augmentation framework for extreme value forecasting. In this framework, our focus is on forecasting extreme values using deep learning models in combination with data augmentation models such as GANs and synthetic minority oversampling technique (SMOTE). We use deep learning models such as convolutional long short-term memory (Conv-LSTM) and bidirectional long short-term memory (BD-LSTM) networks for multistep ahead prediction featuring extremes. We investigate which data augmentation models are the most suitable, taking into account the prediction accuracy overall and at extreme regions, along with computational efficiency. We also present novel strategies for incorporating data augmentation, considering extreme values based on a relevance function. Our results indicate that the SMOTE-based strategy consistently demonstrated superior adaptability, leading to improved performance across both short- and long-horizon forecasts. Conv-LSTM and BD-LSTM exhibit complementary strengths: the former excels in periodic, stable datasets, while the latter performs better in chaotic or non-stationary sequences.

Paper Structure

This paper contains 20 sections, 7 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Relevance function construction using PCHIP on percentiles.
  • Figure 2: Relevance-based framework for extreme value forecasting using data augmentation and deep learning.
  • Figure 3: Cyclone dataset visualisation
  • Figure 4: Performance of Resampling Strategies on 5-Step Ahead SER-5% for the Bike and Cyclone Datasets Using BD-LSTM
  • Figure 5: Performance comparison of three resampling strategies on the Cyclone-SPO dataset using Conv-LSTM (a) and BD-LSTM (b). Blue lines represent No Resampling, orange lines represent SMOTER-regular, and green lines represent SMOTER-bin.
  • ...and 2 more figures