Table of Contents
Fetching ...

$\spadesuit$ SPADE $\spadesuit$ Split Peak Attention DEcomposition

Malcolm Wolff, Kin G. Olivares, Boris Oreshkin, Sunny Ruan, Sitan Yang, Abhinav Katoch, Shankar Ramasubramanian, Youxin Zhang, Michael W. Mahoney, Dmitry Efimov, Vincent Quenneville-Bélair

TL;DR

Peak events induce spikes and carry-over bias in demand forecasting, harming downstream inventory decisions. SPADE addresses this by splitting time series into peak and non-peak components using Masked Convolutions and by applying a specialized Peak Attention mechanism within a Sequence-to-Sequence framework, leveraging known future indicators. The approach optimizes with multi-quantile loss and reports substantial gains: $4.5\%$ PPE improvement overall, up to $30\%$ for the most affected after promotions/holidays, and $3.9\%$ PE accuracy improvement over prior production models. The work demonstrates scalability to hundreds of millions of series and offers a practical path to more reliable post-peak forecasting in retail.

Abstract

Demand forecasting faces challenges induced by Peak Events (PEs) corresponding to special periods such as promotions and holidays. Peak events create significant spikes in demand followed by demand ramp down periods. Neural networks like MQCNN and MQT overreact to demand peaks by carrying over the elevated PE demand into subsequent Post-Peak-Event (PPE) periods, resulting in significantly over-biased forecasts. To tackle this challenge, we introduce a neural forecasting model called Split Peak Attention DEcomposition, SPADE. This model reduces the impact of PEs on subsequent forecasts by modeling forecasting as consisting of two separate tasks: one for PEs; and the other for the rest. Its architecture then uses masked convolution filters and a specialized Peak Attention module. We show SPADE's performance on a worldwide retail dataset with hundreds of millions of products. Our results reveal an overall PPE improvement of 4.5%, a 30% improvement for most affected forecasts after promotions and holidays, and an improvement in PE accuracy by 3.9%, relative to current production models.

$\spadesuit$ SPADE $\spadesuit$ Split Peak Attention DEcomposition

TL;DR

Peak events induce spikes and carry-over bias in demand forecasting, harming downstream inventory decisions. SPADE addresses this by splitting time series into peak and non-peak components using Masked Convolutions and by applying a specialized Peak Attention mechanism within a Sequence-to-Sequence framework, leveraging known future indicators. The approach optimizes with multi-quantile loss and reports substantial gains: PPE improvement overall, up to for the most affected after promotions/holidays, and PE accuracy improvement over prior production models. The work demonstrates scalability to hundreds of millions of series and offers a practical path to more reliable post-peak forecasting in retail.

Abstract

Demand forecasting faces challenges induced by Peak Events (PEs) corresponding to special periods such as promotions and holidays. Peak events create significant spikes in demand followed by demand ramp down periods. Neural networks like MQCNN and MQT overreact to demand peaks by carrying over the elevated PE demand into subsequent Post-Peak-Event (PPE) periods, resulting in significantly over-biased forecasts. To tackle this challenge, we introduce a neural forecasting model called Split Peak Attention DEcomposition, SPADE. This model reduces the impact of PEs on subsequent forecasts by modeling forecasting as consisting of two separate tasks: one for PEs; and the other for the rest. Its architecture then uses masked convolution filters and a specialized Peak Attention module. We show SPADE's performance on a worldwide retail dataset with hundreds of millions of products. Our results reveal an overall PPE improvement of 4.5%, a 30% improvement for most affected forecasts after promotions and holidays, and an improvement in PE accuracy by 3.9%, relative to current production models.

Paper Structure

This paper contains 12 sections, 6 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Illustration of "carry-over" degradation, visible in MQT's forecast downward trend. Peak values carry-over, degrading MQT's forecast accuracy, whereas SPADE does not exhibit such an effect.
  • Figure 2: SPADE decomposes its temporal features to distinguish usual behavior from peaks.
  • Figure 3: SPADE shows evidence of forecast accuracy scaling with training time series.
  • Figure 4: Masked convolutions enhance neural forecasting architectures by filtering peaks before inputting temporal features to the encoder, thus mitigating the peak carry-over effect.
  • Figure 5: The Peak Attention module regularizes the classic attention mechanism by sparsifying its weights using future covariate information.
  • ...and 1 more figures