Table of Contents
Fetching ...

Meta-learning and Data Augmentation for Stress Testing Forecasting Models

Ricardo Inácio, Vitor Cerqueira, Marília Barandas, Carlos Soares

TL;DR

This work addresses the reliability of univariate time-series forecasting under stress by introducing MAST, a meta-learning framework that predicts the probability of large forecasting errors from time-series structural features and augmented data to counteract class imbalance. The method operates in two stages: a development stage that builds a metamodel from a meta-dataset augmented with synthetic rare-error samples, and an inference stage that estimates large-error probability for new series. Experiments on M3, M4, and Tourism datasets show MAST can identify stressful conditions, with ADASYN augmentation often yielding the best predictive performance and SHAP analyses revealing which features (e.g., linearity, spike) drive risk. The approach enhances forecast transparency and risk-awareness, enabling more robust and trustworthy decision-making, and the authors make code and data publicly available.

Abstract

The effectiveness of univariate forecasting models is often hampered by conditions that cause them stress. A model is considered to be under stress if it shows a negative behaviour, such as higher-than-usual errors or increased uncertainty. Understanding the factors that cause stress to forecasting models is important to improve their reliability, transparency, and utility. This paper addresses this problem by contributing with a novel framework called MAST (Meta-learning and data Augmentation for Stress Testing). The proposed approach aims to model and characterize stress in univariate time series forecasting models, focusing on conditions where they exhibit large errors. In particular, MAST is a meta-learning approach that predicts the probability that a given model will perform poorly on a given time series based on a set of statistical time series features. MAST also encompasses a novel data augmentation technique based on oversampling to improve the metadata concerning stress. We conducted experiments using three benchmark datasets that contain a total of 49.794 time series to validate the performance of MAST. The results suggest that the proposed approach is able to identify conditions that lead to large errors. The method and experiments are publicly available in a repository.

Meta-learning and Data Augmentation for Stress Testing Forecasting Models

TL;DR

This work addresses the reliability of univariate time-series forecasting under stress by introducing MAST, a meta-learning framework that predicts the probability of large forecasting errors from time-series structural features and augmented data to counteract class imbalance. The method operates in two stages: a development stage that builds a metamodel from a meta-dataset augmented with synthetic rare-error samples, and an inference stage that estimates large-error probability for new series. Experiments on M3, M4, and Tourism datasets show MAST can identify stressful conditions, with ADASYN augmentation often yielding the best predictive performance and SHAP analyses revealing which features (e.g., linearity, spike) drive risk. The approach enhances forecast transparency and risk-awareness, enabling more robust and trustworthy decision-making, and the authors make code and data publicly available.

Abstract

The effectiveness of univariate forecasting models is often hampered by conditions that cause them stress. A model is considered to be under stress if it shows a negative behaviour, such as higher-than-usual errors or increased uncertainty. Understanding the factors that cause stress to forecasting models is important to improve their reliability, transparency, and utility. This paper addresses this problem by contributing with a novel framework called MAST (Meta-learning and data Augmentation for Stress Testing). The proposed approach aims to model and characterize stress in univariate time series forecasting models, focusing on conditions where they exhibit large errors. In particular, MAST is a meta-learning approach that predicts the probability that a given model will perform poorly on a given time series based on a set of statistical time series features. MAST also encompasses a novel data augmentation technique based on oversampling to improve the metadata concerning stress. We conducted experiments using three benchmark datasets that contain a total of 49.794 time series to validate the performance of MAST. The results suggest that the proposed approach is able to identify conditions that lead to large errors. The method and experiments are publicly available in a repository.

Paper Structure

This paper contains 25 sections, 1 equation, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Forecasting performance of a model across several univariate time series according to SMAPE.
  • Figure 2: Workflow behind MAST, which is split into a development stage and an inference stage. In the development stage, we conduct performance estimation, feature extraction, data augmentation, and meta-learning. Then, the resulting metamodel is applied during the inference stage to predict whether a forecasting model will incur a large error.
  • Figure 3: Data preparation step, where for each time series, a split equal to the forecasting horizon (h) is done at the end of the data, to define the test set, assigning the rest to the train set. Another split with the same size is done at the end of the latter, defining the validation set, assigning the remaining to the development set.
  • Figure 4: Analysis of the impact of error percentile threshold in metamodel performance, measured in AUC. For each time series collection, the performance for the base and augmented metamodels is compared regarding six levels of percentile error threshold.
  • Figure 5: Shapley values for the 5 most relevant features for the M3 dataset's metamodel, indicating which ones contribute positively and negatively to its output. Values to the right of the vertical line indicate a positive influence, while the ones at the left indicate a negative one. The colour indicates the magnitude of the features' value.