Table of Contents
Fetching ...

Hyperparameter Tuning MLPs for Probabilistic Time Series Forecasting

Kiran Madhusudhanan, Shayan Jawed, Lars Schmidt-Thieme

TL;DR

This work addresses the gap in hyperparameter optimization for time-series forecasting with neural models by focusing on time-series-specific hyperparameters such as context length and validation strategy. It analyzes the NLinear MLP variant and extends it to probabilistic forecasting within GluonTS, including distribution heads and gradient statistics. The authors introduce TSBench, the largest metadataset to date for probabilistic time-series forecasting, totaling $97{,}200$ evaluations across $20$ Monash datasets with $4{,}800$ configurations per dataset and multi-fidelity evaluation capabilities. They benchmark multiple HPO strategies (SMAC, HyperBand, BOHB, Random), demonstrate context-length and learning-rate importance, and show that a linear MLP is a robust baseline while TSBench enables effective transfer and learning-curve forecasting insights for HPO.

Abstract

Time series forecasting attempts to predict future events by analyzing past trends and patterns. Although well researched, certain critical aspects pertaining to the use of deep learning in time series forecasting remain ambiguous. Our research primarily focuses on examining the impact of specific hyperparameters related to time series, such as context length and validation strategy, on the performance of the state-of-the-art MLP model in time series forecasting. We have conducted a comprehensive series of experiments involving 4800 configurations per dataset across 20 time series forecasting datasets, and our findings demonstrate the importance of tuning these parameters. Furthermore, in this work, we introduce the largest metadataset for timeseries forecasting to date, named TSBench, comprising 97200 evaluations, which is a twentyfold increase compared to previous works in the field. Finally, we demonstrate the utility of the created metadataset on multi-fidelity hyperparameter optimization tasks.

Hyperparameter Tuning MLPs for Probabilistic Time Series Forecasting

TL;DR

This work addresses the gap in hyperparameter optimization for time-series forecasting with neural models by focusing on time-series-specific hyperparameters such as context length and validation strategy. It analyzes the NLinear MLP variant and extends it to probabilistic forecasting within GluonTS, including distribution heads and gradient statistics. The authors introduce TSBench, the largest metadataset to date for probabilistic time-series forecasting, totaling evaluations across Monash datasets with configurations per dataset and multi-fidelity evaluation capabilities. They benchmark multiple HPO strategies (SMAC, HyperBand, BOHB, Random), demonstrate context-length and learning-rate importance, and show that a linear MLP is a robust baseline while TSBench enables effective transfer and learning-curve forecasting insights for HPO.

Abstract

Time series forecasting attempts to predict future events by analyzing past trends and patterns. Although well researched, certain critical aspects pertaining to the use of deep learning in time series forecasting remain ambiguous. Our research primarily focuses on examining the impact of specific hyperparameters related to time series, such as context length and validation strategy, on the performance of the state-of-the-art MLP model in time series forecasting. We have conducted a comprehensive series of experiments involving 4800 configurations per dataset across 20 time series forecasting datasets, and our findings demonstrate the importance of tuning these parameters. Furthermore, in this work, we introduce the largest metadataset for timeseries forecasting to date, named TSBench, comprising 97200 evaluations, which is a twentyfold increase compared to previous works in the field. Finally, we demonstrate the utility of the created metadataset on multi-fidelity hyperparameter optimization tasks.
Paper Structure (12 sections, 3 equations, 6 figures, 4 tables)

This paper contains 12 sections, 3 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Model Architecture. The parameters $C$ and $\delta$ are indicative of the context length and forecast horizon, respectively. The hidden layers within the model are represented by $f$, $g$, and $h$, and are interspersed with ELU non-linearity. The parameter $d$ signifies the distribution parameters that is learned per prediction time step.
  • Figure 2: Prediction length vs Context Length colored by Frequency of dataset. Longer Prediction length.
  • Figure 3: Prediction length vs Context Length colored by Frequency of dataset. Shorter Prediction length.
  • Figure 4: Hyperparameter importance score
  • Figure 5: Architecture selection globally across multiple datasets
  • ...and 1 more figures