Table of Contents
Fetching ...

AALF: Almost Always Linear Forecasting

Matthias Jakobs, Thomas Liebig

TL;DR

AALF addresses the tension between interpretability and predictive accuracy in time-series forecasting by online model selection between an autoregressive forecaster and a deep-learning model at each time step, guided by a learnable binary classifier under an interpretability constraint $B$ (i.e., using the interpretable model at least $B$ times). The framework derives the optimal per-step choice using a loss difference $\\ell(t)$ and trains a classifier on features that summarize model disagreement and history to approximate this optimal policy. Empirically, AALF achieves competitive RMSE/SMAPE with significantly greater interpretability across 6 real-world datasets and 3500+ time series, outperforming or matching state-of-the-art online selectors for many configurations, particularly at moderate interpretability levels ($p=B/T$). The approach is generic, scalable, and extensible, with future directions including multivariate extensions and constraint-guaranteed strategies, offering practical impact for safer, auditable forecasting in high-stakes settings.

Abstract

Recent works for time-series forecasting more and more leverage the high predictive power of Deep Learning models. With this increase in model complexity, however, comes a lack in understanding of the underlying model decision process, which is problematic for high-stakes application scenarios. At the same time, simple, interpretable forecasting methods such as ARIMA still perform very well, sometimes on-par, with Deep Learning approaches. We argue that simple models are good enough most of the time, and that forecasting performance could be improved by choosing a Deep Learning method only for few, important predictions, increasing the overall interpretability of the forecasting process. In this context, we propose a novel online model selection framework which learns to identify these predictions. An extensive empirical study on various real-world datasets shows that our selection methodology performs comparable to state-of-the-art online model selections methods in most cases while being significantly more interpretable. We find that almost always choosing a simple autoregressive linear model for forecasting results in competitive performance, suggesting that the need for opaque black-box models in time-series forecasting might be smaller than recent works would suggest.

AALF: Almost Always Linear Forecasting

TL;DR

AALF addresses the tension between interpretability and predictive accuracy in time-series forecasting by online model selection between an autoregressive forecaster and a deep-learning model at each time step, guided by a learnable binary classifier under an interpretability constraint (i.e., using the interpretable model at least times). The framework derives the optimal per-step choice using a loss difference and trains a classifier on features that summarize model disagreement and history to approximate this optimal policy. Empirically, AALF achieves competitive RMSE/SMAPE with significantly greater interpretability across 6 real-world datasets and 3500+ time series, outperforming or matching state-of-the-art online selectors for many configurations, particularly at moderate interpretability levels (). The approach is generic, scalable, and extensible, with future directions including multivariate extensions and constraint-guaranteed strategies, offering practical impact for safer, auditable forecasting in high-stakes settings.

Abstract

Recent works for time-series forecasting more and more leverage the high predictive power of Deep Learning models. With this increase in model complexity, however, comes a lack in understanding of the underlying model decision process, which is problematic for high-stakes application scenarios. At the same time, simple, interpretable forecasting methods such as ARIMA still perform very well, sometimes on-par, with Deep Learning approaches. We argue that simple models are good enough most of the time, and that forecasting performance could be improved by choosing a Deep Learning method only for few, important predictions, increasing the overall interpretability of the forecasting process. In this context, we propose a novel online model selection framework which learns to identify these predictions. An extensive empirical study on various real-world datasets shows that our selection methodology performs comparable to state-of-the-art online model selections methods in most cases while being significantly more interpretable. We find that almost always choosing a simple autoregressive linear model for forecasting results in competitive performance, suggesting that the need for opaque black-box models in time-series forecasting might be smaller than recent works would suggest.
Paper Structure (11 sections, 1 theorem, 13 equations, 6 figures, 3 tables)

This paper contains 11 sections, 1 theorem, 13 equations, 6 figures, 3 tables.

Key Result

Proposition 1

The optimal solution to eq:optim is the vector $\bm{s}^* = (s_1^*, \dots, s_T^*)$ with elements where $\ell(t) := (f(\bm{x}_t)-y_t)^2 - (g(\bm{x}_t) - y_t)^2$ and $\pi: [T] \rightarrow [T]$ is a permutation satisfying $\ell(\pi(t)) \leq \ell(\pi(t+1)) ~ \forall t \in [T-1]$.

Figures (6)

  • Figure 1: Schematic overview of our proposed method AALF: Given a set of windowed time-series $\{\bm{x}_1, \dots, \bm{x}_T \}$ with the corresponding forecasts $\{ y_1, \dots, y_T \}$ we compute both models predictions $f(\bm{x}_t)$ and $g(\bm{x}_t)$, as well as some additional features. The predictions, along with the label, are used to predict the optimal model selection $\bm{s}^*$, which we use as labels to estimate a classifier.
  • Figure 2: Illustrative example of how many (and which) entries to choose to get the optimal, constraint-satisfying solution. On the left we have to choose the indices $\pi(4), \pi(5)$ and $\pi(6)$ even though their corresponding loss difference $\ell(t)$ is positive to satisfy the constraint $B \geq 6$. On the right side we see that after choosing the first $6$ entries we can continue and further decrease the loss, choosing $7$ entries in total.
  • Figure 3: Critical Difference diagrams for all 6 datasets (each row corresponds to one dataset). The x axis shows the average rank of each model over the time-series in each dataset. If two models are not found to have significantly different average ranks based on a Wilcoxon signed-rank test they are connected with a horizontal bar.
  • Figure 4: The optimal selection $\mathcal{O}(f, g)$ between pairs of models (shown as lines) versus the performance of the individual models. The y axis corresponds to average RMSE over the datasets while the x axis corresponds to how often $f$ is chosen over $g$. Note that $\mathcal{O}$ is computed using ground-truth data and thus represents the best possible loss achievable for a given $p = B/T$.
  • Figure 5: Critical Difference Diagram (over RMSE error) of all evaluated model selectors, computed over all datasets. The average rank is shown above (smaller is better). Two selectors are connected with a horizontal line if they did no show significantly different performance.
  • ...and 1 more figures

Theorems & Definitions (2)

  • Proposition 1
  • proof