Table of Contents
Fetching ...

Efficient Traffic Prediction Through Spatio-Temporal Distillation

Qianru Zhang, Xinyi Gao, Haixin Wang, Siu-Ming Yiu, Hongzhi Yin

TL;DR

This work tackles scalability and over-smoothing in spatio-temporal graph neural networks for traffic forecasting by introducing LightST, a two-level spatio-temporal knowledge distillation framework that transfers knowledge from a GNN teacher to a lightweight MLP student. It combines explicit prediction-level distillation with implicit distribution alignment through contrastive learning across spatial and temporal embeddings, enhanced by adaptive embedding alignment to mitigate smoothing. Empirical results on five PeMS datasets show state-of-the-art accuracy with 5x–40x faster inference compared to baselines, validating practical applicability for real-time traffic forecasting. The approach also provides theoretical insights into reducing over-smoothing and demonstrates strong ablations supporting the necessity of both spatial and temporal distillation components.

Abstract

Graph neural networks (GNNs) have gained considerable attention in recent years for traffic flow prediction due to their ability to learn spatio-temporal pattern representations through a graph-based message-passing framework. Although GNNs have shown great promise in handling traffic datasets, their deployment in real-life applications has been hindered by scalability constraints arising from high-order message passing. Additionally, the over-smoothing problem of GNNs may lead to indistinguishable region representations as the number of layers increases, resulting in performance degradation. To address these challenges, we propose a new knowledge distillation paradigm termed LightST that transfers spatial and temporal knowledge from a high-capacity teacher to a lightweight student. Specifically, we introduce a spatio-temporal knowledge distillation framework that helps student MLPs capture graph-structured global spatio-temporal patterns while alleviating the over-smoothing effect with adaptive knowledge distillation. Extensive experiments verify that LightST significantly speeds up traffic flow predictions by 5X to 40X compared to state-of-the-art spatio-temporal GNNs, all while maintaining superior accuracy.

Efficient Traffic Prediction Through Spatio-Temporal Distillation

TL;DR

This work tackles scalability and over-smoothing in spatio-temporal graph neural networks for traffic forecasting by introducing LightST, a two-level spatio-temporal knowledge distillation framework that transfers knowledge from a GNN teacher to a lightweight MLP student. It combines explicit prediction-level distillation with implicit distribution alignment through contrastive learning across spatial and temporal embeddings, enhanced by adaptive embedding alignment to mitigate smoothing. Empirical results on five PeMS datasets show state-of-the-art accuracy with 5x–40x faster inference compared to baselines, validating practical applicability for real-time traffic forecasting. The approach also provides theoretical insights into reducing over-smoothing and demonstrates strong ablations supporting the necessity of both spatial and temporal distillation components.

Abstract

Graph neural networks (GNNs) have gained considerable attention in recent years for traffic flow prediction due to their ability to learn spatio-temporal pattern representations through a graph-based message-passing framework. Although GNNs have shown great promise in handling traffic datasets, their deployment in real-life applications has been hindered by scalability constraints arising from high-order message passing. Additionally, the over-smoothing problem of GNNs may lead to indistinguishable region representations as the number of layers increases, resulting in performance degradation. To address these challenges, we propose a new knowledge distillation paradigm termed LightST that transfers spatial and temporal knowledge from a high-capacity teacher to a lightweight student. Specifically, we introduce a spatio-temporal knowledge distillation framework that helps student MLPs capture graph-structured global spatio-temporal patterns while alleviating the over-smoothing effect with adaptive knowledge distillation. Extensive experiments verify that LightST significantly speeds up traffic flow predictions by 5X to 40X compared to state-of-the-art spatio-temporal GNNs, all while maintaining superior accuracy.
Paper Structure (18 sections, 10 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 18 sections, 10 equations, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: Model performance comparison in terms of both traffic flow prediction accuracy and inference time. Lower MAE and RMSE indicate better performance. The symbol $40\times$ indicates that our LightST runs 40 times faster than a reference baseline method measured by inference time.
  • Figure 2: Our proposed spatio-temporal knowledge distillation framework, LightST, comprises two main components: a GNN-based teacher model and an MLP-based student model. The distillation paradigm itself is structured in two parts: spatio-temporal distillation and distribution alignment.
  • Figure 3: Ablation study of sub-modules in our spatio-temporal knowledge distillation paradigm.
  • Figure 4: Hyparameter study on PeMSD8 and PeMSD3 in terms of MAE.