Table of Contents
Fetching ...

Multiple-Resolution Tokenization for Time Series Forecasting with an Application to Pricing

Egon Peršak, Miguel F. Anjos, Sebastian Lautz, Aleksandar Kolev

TL;DR

The paper addresses forecasting in datasets with numerous time-series and auxiliary variables, particularly in pricing contexts, by introducing Multiple-Resolution Tokenization (MRT). MRT combines: (1) multiple-resolution past data patches and time-varying known variables into a unified token stream, (2) dedicated tokenization for static variables, (3) a cross-series channel mixer to extract inter-series information, and (4) a novel reverse-splitting output head that scales efficiently with the number of tokens. Empirical results on a real-world markdown forecasting task and a public Favorita dataset show MRT can outperform in-house methods and PatchTST in challenging settings, with ablations highlighting the value of multiple resolutions and TVK tokens. These findings suggest MRT offers a robust, scalable approach for price-sensitive forecasting and other multi-series, auxiliary-aware time-series problems, enabling better decision support in pricing, inventory, and related domains.

Abstract

We propose a transformer architecture for time series forecasting with a focus on time series tokenisation and apply it to a real-world prediction problem from the pricing domain. Our architecture aims to learn effective representations at many scales across all available data simultaneously. The model contains a number of novel modules: a differentiated form of time series patching which employs multiple resolutions, a multiple-resolution module for time-varying known variables, a mixer-based module for capturing cross-series information, and a novel output head with favourable scaling to account for the increased number of tokens. We present an application of this model to a real world prediction problem faced by the markdown team at a very large retailer. On the experiments conducted our model outperforms in-house models and the selected existing deep learning architectures.

Multiple-Resolution Tokenization for Time Series Forecasting with an Application to Pricing

TL;DR

The paper addresses forecasting in datasets with numerous time-series and auxiliary variables, particularly in pricing contexts, by introducing Multiple-Resolution Tokenization (MRT). MRT combines: (1) multiple-resolution past data patches and time-varying known variables into a unified token stream, (2) dedicated tokenization for static variables, (3) a cross-series channel mixer to extract inter-series information, and (4) a novel reverse-splitting output head that scales efficiently with the number of tokens. Empirical results on a real-world markdown forecasting task and a public Favorita dataset show MRT can outperform in-house methods and PatchTST in challenging settings, with ablations highlighting the value of multiple resolutions and TVK tokens. These findings suggest MRT offers a robust, scalable approach for price-sensitive forecasting and other multi-series, auxiliary-aware time-series problems, enabling better decision support in pricing, inventory, and related domains.

Abstract

We propose a transformer architecture for time series forecasting with a focus on time series tokenisation and apply it to a real-world prediction problem from the pricing domain. Our architecture aims to learn effective representations at many scales across all available data simultaneously. The model contains a number of novel modules: a differentiated form of time series patching which employs multiple resolutions, a multiple-resolution module for time-varying known variables, a mixer-based module for capturing cross-series information, and a novel output head with favourable scaling to account for the increased number of tokens. We present an application of this model to a real world prediction problem faced by the markdown team at a very large retailer. On the experiments conducted our model outperforms in-house models and the selected existing deep learning architectures.
Paper Structure (44 sections, 5 figures, 4 tables)

This paper contains 44 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: An illustration of Multiple Resolution Patching (MRP). A time series is divided into $k_i$ roughly equal parts according to resolutions in $K$. Each part is then mapped to a token.
  • Figure 2: Basis combination operation as used for tokenising time-varying known variables. A matrix is divided along the time dimension according to the resolution set $K$. The columns in each patch are combined using a resolution-wise learnt linear combination rule.
  • Figure 3: Channel mixer: a module for learning cross-series tokens. The channel mixer takes all other tokens and passes them through a mixer architecture which applies linear layers across the time and then the channel dimension.
  • Figure 4: An illustration of the reverse splitter output head. The reverse splitter only operates on post transformer tokens corresponding to the original MRP positions. Each token is projected into a proportional part of the final forecast according to the corresponding resolution from the resolution set $K$. The forecasts produced by the different resolutions are then summed together to produce the final forecast.
  • Figure 5: Favorita experiment test results for the resolution set ablation.