Table of Contents
Fetching ...

MQTransformer: Multi-Horizon Forecasts with Context Dependent and Feedback-Aware Attention

Carson Eisenach, Yagna Patel, Dhruv Madeka

TL;DR

This work addresses the challenge of accurate multi-horizon probabilistic forecasting by introducing MQTransformer, which adds context-dependent horizon-specific decoder–encoder attention, learned position encodings from event indicators, and a decoder self-attention mechanism that leverages forecast feedback. The approach directly outputs quantiles, scales to large datasets via forking sequences, and demonstrates substantial improvements in both forecast accuracy and volatility reduction across large-scale and public benchmarks. Key findings include up to 33% gains in seasonal peak accuracy, large reductions in excess forecast volatility, and a 38% improvement over prior state-of-the-art on a retail forecasting dataset. The methods offer practical benefits for high-volume forecasting tasks in supply chain and related domains, with notable gains in throughput and prediction reliability.

Abstract

Recent advances in neural forecasting have produced major improvements in accuracy for probabilistic demand prediction. In this work, we propose novel improvements to the current state of the art by incorporating changes inspired by recent advances in Transformer architectures for Natural Language Processing. We develop a novel decoder-encoder attention for context-alignment, improving forecasting accuracy by allowing the network to study its own history based on the context for which it is producing a forecast. We also present a novel positional encoding that allows the neural network to learn context-dependent seasonality functions as well as arbitrary holiday distances. Finally we show that the current state of the art MQ-Forecaster (Wen et al., 2017) models display excess variability by failing to leverage previous errors in the forecast to improve accuracy. We propose a novel decoder-self attention scheme for forecasting that produces significant improvements in the excess variation of the forecast.

MQTransformer: Multi-Horizon Forecasts with Context Dependent and Feedback-Aware Attention

TL;DR

This work addresses the challenge of accurate multi-horizon probabilistic forecasting by introducing MQTransformer, which adds context-dependent horizon-specific decoder–encoder attention, learned position encodings from event indicators, and a decoder self-attention mechanism that leverages forecast feedback. The approach directly outputs quantiles, scales to large datasets via forking sequences, and demonstrates substantial improvements in both forecast accuracy and volatility reduction across large-scale and public benchmarks. Key findings include up to 33% gains in seasonal peak accuracy, large reductions in excess forecast volatility, and a 38% improvement over prior state-of-the-art on a retail forecasting dataset. The methods offer practical benefits for high-volume forecasting tasks in supply chain and related domains, with notable gains in throughput and prediction reliability.

Abstract

Recent advances in neural forecasting have produced major improvements in accuracy for probabilistic demand prediction. In this work, we propose novel improvements to the current state of the art by incorporating changes inspired by recent advances in Transformer architectures for Natural Language Processing. We develop a novel decoder-encoder attention for context-alignment, improving forecasting accuracy by allowing the network to study its own history based on the context for which it is producing a forecast. We also present a novel positional encoding that allows the neural network to learn context-dependent seasonality functions as well as arbitrary holiday distances. Finally we show that the current state of the art MQ-Forecaster (Wen et al., 2017) models display excess variability by failing to leverage previous errors in the forecast to improve accuracy. We propose a novel decoder-self attention scheme for forecasting that produces significant improvements in the excess variation of the forecast.

Paper Structure

This paper contains 29 sections, 7 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Example of a demand forecasting task where periods $T-3$ and $T+2$ both have promotions. The encoded context $h_{T-3}$, the last time the item had a promotion, contains useful information for forecasting for target periods that also have a promotion. The horizon-specific attention aligns past encoded contexts with the target horizon.
  • Figure 2: The decoder attends over past forecasts for the same target horizon -- the context $h_t$ contains feedback information (demand and other encoded signals), allowing the model to adjust forecasts that are either too volatile or not volatile enough as the target date approaches.
  • Figure 3: Martingale diagnostic process $\{V_t\}$ for P50 (left) and P90 (right) forecasts. Trajectories are demand-weighted and the results are averaged over all items, target weeks in the test period (2018-2019). Closer to zero is better.
  • Figure 4: Components of the MQTransformer architecture
  • Figure 5: MQTransformer architecture with learned global/local positional encoding, horizon-specific decoder-encoder attention, and decoder self-attention
  • ...and 1 more figures