Table of Contents
Fetching ...

DRFormer: Multi-Scale Transformer Utilizing Diverse Receptive Fields for Long Time-Series Forecasting

Ruixin Ding, Yuqi Chen, Yu-Ting Lan, Wei Zhang

TL;DR

DRFormer tackles long-term time-series forecasting by removing the reliance on fixed patch lengths through a dynamic tokenizer with sparse learning to capture diverse receptive fields. It builds multi-scale representations via hierarchical pooling and a group-aware Transformer with gRoPE, followed by deconvolution-based fusion to predict future sequences. Empirical results on multiple real-world datasets show state-of-the-art performance, with consistent gains over both Transformer-based and non-Transformer baselines, and ablations confirm the contributions of dynamic modeling, multi-scale modeling, and advanced position encoding. The work offers a transferable framework for patch-based time-series modeling that reduces the need for expert patch-length selection and effectively captures cross-scale dependencies.

Abstract

Long-term time series forecasting (LTSF) has been widely applied in finance, traffic prediction, and other domains. Recently, patch-based transformers have emerged as a promising approach, segmenting data into sub-level patches that serve as input tokens. However, existing methods mostly rely on predetermined patch lengths, necessitating expert knowledge and posing challenges in capturing diverse characteristics across various scales. Moreover, time series data exhibit diverse variations and fluctuations across different temporal scales, which traditional approaches struggle to model effectively. In this paper, we propose a dynamic tokenizer with a dynamic sparse learning algorithm to capture diverse receptive fields and sparse patterns of time series data. In order to build hierarchical receptive fields, we develop a multi-scale Transformer model, coupled with multi-scale sequence extraction, capable of capturing multi-resolution features. Additionally, we introduce a group-aware rotary position encoding technique to enhance intra- and inter-group position awareness among representations across different temporal scales. Our proposed model, named DRFormer, is evaluated on various real-world datasets, and experimental results demonstrate its superiority compared to existing methods. Our code is available at: https://github.com/ruixindingECNU/DRFormer.

DRFormer: Multi-Scale Transformer Utilizing Diverse Receptive Fields for Long Time-Series Forecasting

TL;DR

DRFormer tackles long-term time-series forecasting by removing the reliance on fixed patch lengths through a dynamic tokenizer with sparse learning to capture diverse receptive fields. It builds multi-scale representations via hierarchical pooling and a group-aware Transformer with gRoPE, followed by deconvolution-based fusion to predict future sequences. Empirical results on multiple real-world datasets show state-of-the-art performance, with consistent gains over both Transformer-based and non-Transformer baselines, and ablations confirm the contributions of dynamic modeling, multi-scale modeling, and advanced position encoding. The work offers a transferable framework for patch-based time-series modeling that reduces the need for expert patch-length selection and effectively captures cross-scale dependencies.

Abstract

Long-term time series forecasting (LTSF) has been widely applied in finance, traffic prediction, and other domains. Recently, patch-based transformers have emerged as a promising approach, segmenting data into sub-level patches that serve as input tokens. However, existing methods mostly rely on predetermined patch lengths, necessitating expert knowledge and posing challenges in capturing diverse characteristics across various scales. Moreover, time series data exhibit diverse variations and fluctuations across different temporal scales, which traditional approaches struggle to model effectively. In this paper, we propose a dynamic tokenizer with a dynamic sparse learning algorithm to capture diverse receptive fields and sparse patterns of time series data. In order to build hierarchical receptive fields, we develop a multi-scale Transformer model, coupled with multi-scale sequence extraction, capable of capturing multi-resolution features. Additionally, we introduce a group-aware rotary position encoding technique to enhance intra- and inter-group position awareness among representations across different temporal scales. Our proposed model, named DRFormer, is evaluated on various real-world datasets, and experimental results demonstrate its superiority compared to existing methods. Our code is available at: https://github.com/ruixindingECNU/DRFormer.
Paper Structure (39 sections, 16 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 39 sections, 16 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overview of our DRFormer. DRFormer first utilizes a dynamic tokenizer to capture diverse receptive fields of each tokenizer. A hierarchical max pooling operation is then applied to leverage the multi-resolution property inherent in time series data. The multi-resolution time series data is then encoded by a group-aware Transformer model and finally processed by a deconvolution operation.
  • Figure 2: Illustration of static patching, dynamic tokenizer and multi-scale sequence extraction. 1 Taking $P$ = $16$ as an example, the input sequence is transformed into N patches. 2 The dynamic linear layer is divided into $G$ groups and the corresponding exploration regions for each group are shown in the red boxes. The number of group $G$ is set to $4$ and the sparse ratio $SR$ is set to $0.5$. Purple (blue) circles indicate activated (inactivated) weights. 3 Hierarchical max-pooling on patched tokens yields multi-group representations with a more comprehensive set of receptive fields as shown in Equation \ref{['eq:multi-scale rf']}.
  • Figure 3: The performance of DRFormer on ETTh1 and ETTm1 across varying numbers of multi-scale sequences.
  • Figure 4: The performance of Transformer(w/o DT) and DRFormer on ETTh2 and ETTm1 datasets under different predetermined patch lengths.
  • Figure 5: Visualization of forecasting results on the Traffic dataset with I = 96 and O = 192. The black (grey) lines stand for input sequences (sequences before input). The green (red) lines stand for the ground truth (prediction). The blue (red) dashed lines represent the periodicity of the ground truth (prediction). Different diameters of circles represent different receptive fields.
  • ...and 2 more figures