Efficient Traffic Prediction Through Spatio-Temporal Distillation

Qianru Zhang; Xinyi Gao; Haixin Wang; Siu-Ming Yiu; Hongzhi Yin

Efficient Traffic Prediction Through Spatio-Temporal Distillation

Qianru Zhang, Xinyi Gao, Haixin Wang, Siu-Ming Yiu, Hongzhi Yin

TL;DR

This work tackles scalability and over-smoothing in spatio-temporal graph neural networks for traffic forecasting by introducing LightST, a two-level spatio-temporal knowledge distillation framework that transfers knowledge from a GNN teacher to a lightweight MLP student. It combines explicit prediction-level distillation with implicit distribution alignment through contrastive learning across spatial and temporal embeddings, enhanced by adaptive embedding alignment to mitigate smoothing. Empirical results on five PeMS datasets show state-of-the-art accuracy with 5x–40x faster inference compared to baselines, validating practical applicability for real-time traffic forecasting. The approach also provides theoretical insights into reducing over-smoothing and demonstrates strong ablations supporting the necessity of both spatial and temporal distillation components.

Abstract

Graph neural networks (GNNs) have gained considerable attention in recent years for traffic flow prediction due to their ability to learn spatio-temporal pattern representations through a graph-based message-passing framework. Although GNNs have shown great promise in handling traffic datasets, their deployment in real-life applications has been hindered by scalability constraints arising from high-order message passing. Additionally, the over-smoothing problem of GNNs may lead to indistinguishable region representations as the number of layers increases, resulting in performance degradation. To address these challenges, we propose a new knowledge distillation paradigm termed LightST that transfers spatial and temporal knowledge from a high-capacity teacher to a lightweight student. Specifically, we introduce a spatio-temporal knowledge distillation framework that helps student MLPs capture graph-structured global spatio-temporal patterns while alleviating the over-smoothing effect with adaptive knowledge distillation. Extensive experiments verify that LightST significantly speeds up traffic flow predictions by 5X to 40X compared to state-of-the-art spatio-temporal GNNs, all while maintaining superior accuracy.

Efficient Traffic Prediction Through Spatio-Temporal Distillation

TL;DR

Abstract

Paper Structure (18 sections, 10 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 18 sections, 10 equations, 4 figures, 3 tables, 1 algorithm.

Introduction
Methodology
Spatio-temporal Graph and Traffic Data
Spatio-Temporal Graph Neural Networks
Distillation Process
Discussion of Model
Evaluation
Experimental Setting
Effectiveness Evaluation
Model Scalability Investigation
Ablation Study and Effectiveness Analyses
Hyperparameter Study
Related Work
Conclusion
Appendices
...and 3 more sections

Figures (4)

Figure 1: Model performance comparison in terms of both traffic flow prediction accuracy and inference time. Lower MAE and RMSE indicate better performance. The symbol $40\times$ indicates that our LightST runs 40 times faster than a reference baseline method measured by inference time.
Figure 2: Our proposed spatio-temporal knowledge distillation framework, LightST, comprises two main components: a GNN-based teacher model and an MLP-based student model. The distillation paradigm itself is structured in two parts: spatio-temporal distillation and distribution alignment.
Figure 3: Ablation study of sub-modules in our spatio-temporal knowledge distillation paradigm.
Figure 4: Hyparameter study on PeMSD8 and PeMSD3 in terms of MAE.

Efficient Traffic Prediction Through Spatio-Temporal Distillation

TL;DR

Abstract

Efficient Traffic Prediction Through Spatio-Temporal Distillation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)