Frequency-Aligned Knowledge Distillation for Lightweight Spatiotemporal Forecasting
Yuqi Li, Chuanguang Yang, Hansheng Zeng, Zeyu Dong, Zhulin An, Yongjun Xu, Yingli Tian, Hao Wu
TL;DR
The paper tackles the computational burden of state-of-the-art spatiotemporal forecasting models by introducing Spectral Decoupled Knowledge Distillation (SDKD). It designs a frequency-decoupled teacher (CNN-based high-frequency extractor and Transformer-based low-frequency modeler) and a lightweight student to transfer multi-scale spectral information through frequency-aligned distillation, including a multi-teacher gradient-weighting mechanism (A2D). The approach yields strong predictive performance while significantly reducing inference time across multiple datasets, demonstrating effective preservation of both local transients and global evolution. This framework offers a general, architecture-agnostic pathway for deploying high-accuracy spatiotemporal models on resource-constrained platforms.
Abstract
Spatiotemporal forecasting tasks, such as traffic flow, combustion dynamics, and weather forecasting, often require complex models that suffer from low training efficiency and high memory consumption. This paper proposes a lightweight framework, Spectral Decoupled Knowledge Distillation (termed SDKD), which transfers the multi-scale spatiotemporal representations from a complex teacher model to a more efficient lightweight student network. The teacher model follows an encoder-latent evolution-decoder architecture, where its latent evolution module decouples high-frequency details and low-frequency trends using convolution and Transformer (global low-frequency modeler). However, the multi-layer convolution and deconvolution structures result in slow training and high memory usage. To address these issues, we propose a frequency-aligned knowledge distillation strategy, which extracts multi-scale spectral features from the teacher's latent space, including both high and low frequency components, to guide the lightweight student model in capturing both local fine-grained variations and global evolution patterns. Experimental results show that SDKD significantly improves performance, achieving reductions of up to 81.3% in MSE and in MAE 52.3% on the Navier-Stokes equation dataset. The framework effectively captures both high-frequency variations and long-term trends while reducing computational complexity. Our codes are available at https://github.com/itsnotacie/SDKD
