Table of Contents
Fetching ...

DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks

David Salinas, Valentin Flunkert, Jan Gasthaus

TL;DR

The paper addresses probabilistic forecasting for thousands of related time series by learning a global autoregressive recurrent model, DeepAR, that yields calibrated predictive distributions through Monte Carlo sampling. It adopts a flexible likelihood framework (Gaussian or negative-binomial) and introduces scale-aware training and velocity-based sampling to cope with wide-ranging series magnitudes and sparsity, enabling accurate quantile estimates for decision-making under uncertainty. Empirical results on diverse real-world datasets show substantial accuracy gains over state-of-the-art methods, improved calibration, and the ability to forecast for new items with little history. The approach is practical at scale, requiring modest manual tuning, and has significant implications for inventory management, demand planning, and other applications that depend on reliable probabilistic forecasts.

Abstract

Probabilistic forecasting, i.e. estimating the probability distribution of a time series' future given its past, is a key enabler for optimizing business processes. In retail businesses, for example, forecasting demand is crucial for having the right inventory available at the right time at the right place. In this paper we propose DeepAR, a methodology for producing accurate probabilistic forecasts, based on training an auto regressive recurrent network model on a large number of related time series. We demonstrate how by applying deep learning techniques to forecasting, one can overcome many of the challenges faced by widely-used classical approaches to the problem. We show through extensive empirical evaluation on several real-world forecasting data sets accuracy improvements of around 15% compared to state-of-the-art methods.

DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks

TL;DR

The paper addresses probabilistic forecasting for thousands of related time series by learning a global autoregressive recurrent model, DeepAR, that yields calibrated predictive distributions through Monte Carlo sampling. It adopts a flexible likelihood framework (Gaussian or negative-binomial) and introduces scale-aware training and velocity-based sampling to cope with wide-ranging series magnitudes and sparsity, enabling accurate quantile estimates for decision-making under uncertainty. Empirical results on diverse real-world datasets show substantial accuracy gains over state-of-the-art methods, improved calibration, and the ability to forecast for new items with little history. The approach is practical at scale, requiring modest manual tuning, and has significant implications for inventory management, demand planning, and other applications that depend on reliable probabilistic forecasts.

Abstract

Probabilistic forecasting, i.e. estimating the probability distribution of a time series' future given its past, is a key enabler for optimizing business processes. In retail businesses, for example, forecasting demand is crucial for having the right inventory available at the right time at the right place. In this paper we propose DeepAR, a methodology for producing accurate probabilistic forecasts, based on training an auto regressive recurrent network model on a large number of related time series. We demonstrate how by applying deep learning techniques to forecasting, one can overcome many of the challenges faced by widely-used classical approaches to the problem. We show through extensive empirical evaluation on several real-world forecasting data sets accuracy improvements of around 15% compared to state-of-the-art methods.

Paper Structure

This paper contains 12 sections, 8 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Log-log histogram of the number of items versus number of sales for the 500K time series of ec, showing the scale-free nature (approximately straight line) present in the ec dataset (axis labels omitted due to the non-public nature of the data).
  • Figure 2: Summary of the model. Training (left): At each time step $t$, the inputs to the network are the covariates $x_{i,t}$, the target value at the previous time step $z_{i,t-1}$, as well as the previous network output $\mathbf{h}_{i, t-1}$. The network output $\mathbf{h}_{i,t} = h(\mathbf{h}_{i,t-1}, z_{i,t-1}, \mathbf{x}_{i,t}, \Theta)$ is then used to compute the parameters $\theta_{i,t} = \theta(\mathbf{h}_{i,t}, \Theta)$ of the likelihood $\ell(z|\theta)$, which is used for training the model parameters. For prediction, the history of the time series $z_{i,t}$ is fed in for $t<t_0$, then in the prediction range (right) for $t\ge t_0$ a sample $\hat{z}_{i,t} \sim \ell(\cdot|\theta_{i,t})$ is drawn and fed back for the next point until the end of the prediction range $t=t_0 + T$ generating one sample trace. Repeating this prediction process yields many traces representing the joint predicted distribution.
  • Figure 3: Example time series of ec. The vertical line separates the conditioning period from the prediction period. The black line shows the true target. In the prediction range we plot the p50 as a blue line (mostly zero for the three slow items) and the 80% confidence interval (shaded). The model learns accurate seasonality patterns and uncertainty estimates for items of different velocity and age.
  • Figure 4: Uncertainty growth over time for ISSM and DeepAR models. Unlike the ISSM, which postulates a linear growth of uncertainty, the behavior of uncertainty is learned from the data, resulting in a non-linear growth with a (plausibly) higher uncertainty around Q4. The aggregate is calculated over the entire ec dataset.
  • Figure 5: Coverage for two spans on the ec-sub dataset. The left panel shows the coverage for a single time-step interval, while the right panel shows these metrics for a larger time interval with 9 time-steps. When correlation in the prediction sample paths is destroyed by shuffling the samples for each time step, correlation is destroyed and the forecast becomes less calibrated. This shuffled prediction also has a 10% higher $0.9$-risk.