Table of Contents
Fetching ...

HUTFormer: Hierarchical U-Net Transformer for Long-Term Traffic Forecasting

Zezhi Shao, Fei Wang, Tao Sun, Chengqing Yu, Yuchen Fang, Guangyin Jin, Zhulin An, Yang Liu, Xiaobo Qu, Yongjun Xu

TL;DR

This work addresses the practical problem of long-term traffic forecasting (e.g., 1 day ahead) by introducing HUTFormer, a Hierarchical U-Net Transformer with a hierarchical encoder–decoder structure. It jointly learns and utilizes multi-scale representations via a window-based encoder and a cross-scale decoder, aided by segment embedding and spatial-temporal positional encoding to control complexity. A two-stage training scheme first trains the encoder to produce an intermediate prediction, then fixes the encoder while training the decoder to refine predictions, enabling effective multi-scale integration. Extensive experiments on four traffic datasets and three generalization datasets demonstrate state-of-the-art accuracy and efficiency, with ablations validating the necessity of the hierarchical design, embedding strategies, and training protocol.

Abstract

Traffic forecasting, which aims to predict traffic conditions based on historical observations, has been an enduring research topic and is widely recognized as an essential component of intelligent transportation. Recent proposals on Spatial-Temporal Graph Neural Networks~(STGNNs) have made significant progress by combining sequential models with graph convolution networks. However, due to high complexity issues, STGNNs only focus on short-term traffic forecasting (e.g., 1-h ahead), while ignoring more practical long-term forecasting. In this paper, we make the first attempt to explore long-term traffic forecasting (e.g., 1-day ahead). To this end, we first reveal its unique challenges in exploiting multi-scale representations. Then, we propose a novel Hierarchical U-Net TransFormer~(HUTFormer) to address the issues of long-term traffic forecasting. HUTFormer consists of a hierarchical encoder and decoder to jointly generate and utilize multi-scale representations of traffic data. Specifically, for the encoder, we {\color{black}propose} window self-attention and segment merging to extract multi-scale representations from long-term traffic data. For the decoder, we design a cross-scale attention mechanism to effectively incorporate multi-scale representations. In addition, HUTFormer employs an efficient input embedding strategy to address the complexity issues. Extensive experiments on four traffic datasets show that the proposed HUTFormer significantly outperforms state-of-the-art traffic forecasting and long time series forecasting baselines.

HUTFormer: Hierarchical U-Net Transformer for Long-Term Traffic Forecasting

TL;DR

This work addresses the practical problem of long-term traffic forecasting (e.g., 1 day ahead) by introducing HUTFormer, a Hierarchical U-Net Transformer with a hierarchical encoder–decoder structure. It jointly learns and utilizes multi-scale representations via a window-based encoder and a cross-scale decoder, aided by segment embedding and spatial-temporal positional encoding to control complexity. A two-stage training scheme first trains the encoder to produce an intermediate prediction, then fixes the encoder while training the decoder to refine predictions, enabling effective multi-scale integration. Extensive experiments on four traffic datasets and three generalization datasets demonstrate state-of-the-art accuracy and efficiency, with ablations validating the necessity of the hierarchical design, embedding strategies, and training protocol.

Abstract

Traffic forecasting, which aims to predict traffic conditions based on historical observations, has been an enduring research topic and is widely recognized as an essential component of intelligent transportation. Recent proposals on Spatial-Temporal Graph Neural Networks~(STGNNs) have made significant progress by combining sequential models with graph convolution networks. However, due to high complexity issues, STGNNs only focus on short-term traffic forecasting (e.g., 1-h ahead), while ignoring more practical long-term forecasting. In this paper, we make the first attempt to explore long-term traffic forecasting (e.g., 1-day ahead). To this end, we first reveal its unique challenges in exploiting multi-scale representations. Then, we propose a novel Hierarchical U-Net TransFormer~(HUTFormer) to address the issues of long-term traffic forecasting. HUTFormer consists of a hierarchical encoder and decoder to jointly generate and utilize multi-scale representations of traffic data. Specifically, for the encoder, we {\color{black}propose} window self-attention and segment merging to extract multi-scale representations from long-term traffic data. For the decoder, we design a cross-scale attention mechanism to effectively incorporate multi-scale representations. In addition, HUTFormer employs an efficient input embedding strategy to address the complexity issues. Extensive experiments on four traffic datasets show that the proposed HUTFormer significantly outperforms state-of-the-art traffic forecasting and long time series forecasting baselines.
Paper Structure (22 sections, 10 equations, 9 figures, 6 tables)

This paper contains 22 sections, 10 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Examples of long-term traffic forecasting.
  • Figure 2: Overview of the proposed HUTFormer. Left: The hierarchical encoder. It generates multi-scale features for traffic data based on window Transformer layer and segment merging, and makes an intermediate prediction. Right: The hierarchical decoder. It fine-tunes the intermediate prediction by incorporating multi-scale features based on cross-scale Transformer layer. In addition, segment embedding and spatial-temporal positional encoding are proposed to address complexity issues.
  • Figure 3: Standard self-attention v.s. window self-attention.
  • Figure 4: An illustration of segment merging.
  • Figure 5: Spatial typologies of METR-LA and PEMS-BAY datasets.
  • ...and 4 more figures

Theorems & Definitions (2)

  • Definition 1
  • Definition 2