Meta-learning and Data Augmentation for Stress Testing Forecasting Models
Ricardo Inácio, Vitor Cerqueira, Marília Barandas, Carlos Soares
TL;DR
This work addresses the reliability of univariate time-series forecasting under stress by introducing MAST, a meta-learning framework that predicts the probability of large forecasting errors from time-series structural features and augmented data to counteract class imbalance. The method operates in two stages: a development stage that builds a metamodel from a meta-dataset augmented with synthetic rare-error samples, and an inference stage that estimates large-error probability for new series. Experiments on M3, M4, and Tourism datasets show MAST can identify stressful conditions, with ADASYN augmentation often yielding the best predictive performance and SHAP analyses revealing which features (e.g., linearity, spike) drive risk. The approach enhances forecast transparency and risk-awareness, enabling more robust and trustworthy decision-making, and the authors make code and data publicly available.
Abstract
The effectiveness of univariate forecasting models is often hampered by conditions that cause them stress. A model is considered to be under stress if it shows a negative behaviour, such as higher-than-usual errors or increased uncertainty. Understanding the factors that cause stress to forecasting models is important to improve their reliability, transparency, and utility. This paper addresses this problem by contributing with a novel framework called MAST (Meta-learning and data Augmentation for Stress Testing). The proposed approach aims to model and characterize stress in univariate time series forecasting models, focusing on conditions where they exhibit large errors. In particular, MAST is a meta-learning approach that predicts the probability that a given model will perform poorly on a given time series based on a set of statistical time series features. MAST also encompasses a novel data augmentation technique based on oversampling to improve the metadata concerning stress. We conducted experiments using three benchmark datasets that contain a total of 49.794 time series to validate the performance of MAST. The results suggest that the proposed approach is able to identify conditions that lead to large errors. The method and experiments are publicly available in a repository.
