HDT: Hierarchical Discrete Transformer for Multivariate Time Series Forecasting

Shibo Feng; Peilin Zhao; Liu Liu; Pengcheng Wu; Zhiqi Shen

HDT: Hierarchical Discrete Transformer for Multivariate Time Series Forecasting

Shibo Feng, Peilin Zhao, Liu Liu, Pengcheng Wu, Zhiqi Shen

TL;DR

HDT addresses high-dimensional multivariate time series forecasting with long horizons by converting targets to discrete tokens through a two-stage vector-quantized process and applying a two-level, self-conditioned Transformer to model priors over these tokens. The low-level stage captures long-term trends from downsampled representations, while the high-level stage generates full target tokens conditioned on these trends, enabling long-range accuracy and fast inference. Empirical results across five real-world datasets show substantial improvements in probabilistic and deterministic forecasts over state-of-the-art baselines, with notable gains over VQ-TR and diffusion-based methods in high-dimensional settings. The approach offers improved scalability and practical impact for high-dimensional MTS forecasting, with future work exploring unified discretization and multimodal integration.

Abstract

Generative models have gained significant attention in multivariate time series forecasting (MTS), particularly due to their ability to generate high-fidelity samples. Forecasting the probability distribution of multivariate time series is a challenging yet practical task. Although some recent attempts have been made to handle this task, two major challenges persist: 1) some existing generative methods underperform in high-dimensional multivariate time series forecasting, which is hard to scale to higher dimensions; 2) the inherent high-dimensional multivariate attributes constrain the forecasting lengths of existing generative models. In this paper, we point out that discrete token representations can model high-dimensional MTS with faster inference time, and forecasting the target with long-term trends of itself can extend the forecasting length with high accuracy. Motivated by this, we propose a vector quantized framework called Hierarchical Discrete Transformer (HDT) that models time series into discrete token representations with l2 normalization enhanced vector quantized strategy, in which we transform the MTS forecasting into discrete tokens generation. To address the limitations of generative models in long-term forecasting, we propose a hierarchical discrete Transformer. This model captures the discrete long-term trend of the target at the low level and leverages this trend as a condition to generate the discrete representation of the target at the high level that introduces the features of the target itself to extend the forecasting length in high-dimensional MTS. Extensive experiments on five popular MTS datasets verify the effectiveness of our proposed method.

HDT: Hierarchical Discrete Transformer for Multivariate Time Series Forecasting

TL;DR

Abstract

HDT: Hierarchical Discrete Transformer for Multivariate Time Series Forecasting

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)