Table of Contents
Fetching ...

MSDformer: Multi-scale Discrete Transformer For Time Series Generation

Zhicheng Chen, Shibo Feng, Xi Xiao, Zhong Zhang, Qing Li, Xingyu Gao, Peilin Zhao

TL;DR

MSDformer tackles synthetic time series generation by introducing a two-stage, discrete-token framework that captures multi-scale temporal patterns. It combines a multi-scale time series tokenizer built from cascaded VQ-VAEs with a multi-scale autoregressive Transformer to model token sequences, enabling coarse-to-fine generation in the discrete latent space. The authors ground the approach in rate-distortion theory, showing that DTM affords explicit control over distortion via codebook size, and that multi-scale modeling increases the effective rate to reduce distortion. Empirically, MSDformer and its predecessor SDformer outperform GAN-, VAE-, and DDPM-based baselines across six datasets, with MSDformer delivering substantial gains in long-term generation and fidelity, while maintaining reasonable inference efficiency. The work suggests that multi-scale DTM is a powerful paradigm for time series synthesis and points to future extensions in adaptive scaling and spatiotemporal generation.

Abstract

Discrete Token Modeling (DTM), which employs vector quantization techniques, has demonstrated remarkable success in modeling non-natural language modalities, particularly in time series generation. While our prior work SDformer established the first DTM-based framework to achieve state-of-the-art performance in this domain, two critical limitations persist in existing DTM approaches: 1) their inability to capture multi-scale temporal patterns inherent to complex time series data, and 2) the absence of theoretical foundations to guide model optimization. To address these challenges, we proposes a novel multi-scale DTM-based time series generation method, called Multi-Scale Discrete Transformer (MSDformer). MSDformer employs a multi-scale time series tokenizer to learn discrete token representations at multiple scales, which jointly characterize the complex nature of time series data. Subsequently, MSDformer applies a multi-scale autoregressive token modeling technique to capture the multi-scale patterns of time series within the discrete latent space. Theoretically, we validate the effectiveness of the DTM method and the rationality of MSDformer through the rate-distortion theorem. Comprehensive experiments demonstrate that MSDformer significantly outperforms state-of-the-art methods. Both theoretical analysis and experimental results demonstrate that incorporating multi-scale information and modeling multi-scale patterns can substantially enhance the quality of generated time series in DTM-based approaches. The code will be released upon acceptance.

MSDformer: Multi-scale Discrete Transformer For Time Series Generation

TL;DR

MSDformer tackles synthetic time series generation by introducing a two-stage, discrete-token framework that captures multi-scale temporal patterns. It combines a multi-scale time series tokenizer built from cascaded VQ-VAEs with a multi-scale autoregressive Transformer to model token sequences, enabling coarse-to-fine generation in the discrete latent space. The authors ground the approach in rate-distortion theory, showing that DTM affords explicit control over distortion via codebook size, and that multi-scale modeling increases the effective rate to reduce distortion. Empirically, MSDformer and its predecessor SDformer outperform GAN-, VAE-, and DDPM-based baselines across six datasets, with MSDformer delivering substantial gains in long-term generation and fidelity, while maintaining reasonable inference efficiency. The work suggests that multi-scale DTM is a powerful paradigm for time series synthesis and points to future extensions in adaptive scaling and spatiotemporal generation.

Abstract

Discrete Token Modeling (DTM), which employs vector quantization techniques, has demonstrated remarkable success in modeling non-natural language modalities, particularly in time series generation. While our prior work SDformer established the first DTM-based framework to achieve state-of-the-art performance in this domain, two critical limitations persist in existing DTM approaches: 1) their inability to capture multi-scale temporal patterns inherent to complex time series data, and 2) the absence of theoretical foundations to guide model optimization. To address these challenges, we proposes a novel multi-scale DTM-based time series generation method, called Multi-Scale Discrete Transformer (MSDformer). MSDformer employs a multi-scale time series tokenizer to learn discrete token representations at multiple scales, which jointly characterize the complex nature of time series data. Subsequently, MSDformer applies a multi-scale autoregressive token modeling technique to capture the multi-scale patterns of time series within the discrete latent space. Theoretically, we validate the effectiveness of the DTM method and the rationality of MSDformer through the rate-distortion theorem. Comprehensive experiments demonstrate that MSDformer significantly outperforms state-of-the-art methods. Both theoretical analysis and experimental results demonstrate that incorporating multi-scale information and modeling multi-scale patterns can substantially enhance the quality of generated time series in DTM-based approaches. The code will be released upon acceptance.

Paper Structure

This paper contains 26 sections, 2 theorems, 33 equations, 5 figures, 13 tables, 3 algorithms.

Key Result

Theorem 1

For a stationary ergodic source $X \sim P_X$ and a bounded distortion measure $d: \mathcal{X} \times \tilde{\mathcal{X}} \to \mathbb{R}^+$, the minimum achievable rate $R(D)$ at distortion level $D$ is given by: where $I(X; \tilde{X})$ is the mutual information between the source $X$ and the reconstruction $\tilde{X}$.

Figures (5)

  • Figure 1: The workflow of multi-scale time series tokenizer. The multi-scale time series tokenizer comprises $K$ modules, each of which is a VQ-VAE model that employs a similarity-driven vector quantization method to learn discrete token representations at different scales. Specifically, the first scale is used to model the large-scale information of the time series, while each subsequent scale models the difference between the input and output of the previous scale. The discrete token sequences obtained at each scale are then combined to jointly characterize the complete time series data. Notably, when $K=1$, the model degenerates to SDformer’s single-scale time series tokenizer zhicheng2024sdformer, as shown by the red dashed line in the figure.
  • Figure 2: The workflow of the second stage of MSDformer: Multi-Scale Autoregressive Token Modeling. In this stage, the [BOS] token is first concatenated with the discrete token sequence of each scale. The resulting sequence, along with the token type ID and position ID, is then fed into the Decoder-only Transformer to predict the next token at each position. Notably, when $K=1$, the model degenerates to SDformer’s single-scale autoregressive token modeling zhicheng2024sdformer.
  • Figure 3: Kernel density estimation visualizations of the time series synthesized by MSDformer, SDformer and Diffusion-TS.
  • Figure 4: t-SNE visualizations of the time series synthesized by MSDformer, SDformer and Diffusion-TS.
  • Figure 5: Example of Multi-scale Visualization of Time Series for $K=2$. The first two columns illustrate the multi-scale reconstructions derived from hierarchical representations, specifically showing the coarse-scale reconstruction ($k=1$) and the fine-scale reconstruction ($k=2$). The subsequent two columns display the final reconstruction results and the original data for comparison.

Theorems & Definitions (3)

  • Theorem 1: Shannon's Rate-Distortion Theorem shannon1959coding
  • Theorem 2: Vector Quantization Rate-Distortion Control
  • proof