Does Vector Quantization Fail in Spatio-Temporal Forecasting? Exploring a Differentiable Sparse Soft-Vector Quantization Approach
Chao Chen, Tian Zhou, Yanjun Zhao, Hui Liu, Liang Sun, Rong Jin
TL;DR
Addressing the underperformance of vector quantization in spatio-temporal forecasting, the paper identifies non-differentiability and limited representation as key bottlenecks and proposes Differentiable Sparse Soft-Vector Quantization (SVQ). SVQ uses a differentiable sparse regression-inspired mechanism with a two-layer MLP to generate regression weights over a large codebook, yielding a differentiable, multi-code quantization that preserves detail while reducing noise. The method can use a static random codebook or a learnable one and integrates as a pre-translator plug-in across backbones like SimVP and various MetaFormers. Empirical results on WeatherBench-S, WeatherBench-M, TaxiBJ, KittiCaltech, Human3.6M, and others show state-of-the-art improvements, including a 7.9% MSE gain on WeatherBench-S, a 9.4% MAE reduction in video prediction, and a 17.3% LPIPS improvement, with strong training stability and modest compute overhead. The work provides theoretical and empirical support for differentiable sparse VQ in forecasting and releases public code.
Abstract
Spatio-temporal forecasting is crucial in various fields and requires a careful balance between identifying subtle patterns and filtering out noise. Vector quantization (VQ) appears well-suited for this purpose, as it quantizes input vectors into a set of codebook vectors or patterns. Although VQ has shown promise in various computer vision tasks, it surprisingly falls short in enhancing the accuracy of spatio-temporal forecasting. We attribute this to two main issues: inaccurate optimization due to non-differentiability and limited representation power in hard-VQ. To tackle these challenges, we introduce Differentiable Sparse Soft-Vector Quantization (SVQ), the first VQ method to enhance spatio-temporal forecasting. SVQ balances detail preservation with noise reduction, offering full differentiability and a solid foundation in sparse regression. Our approach employs a two-layer MLP and an extensive codebook to streamline the sparse regression process, significantly cutting computational costs while simplifying training and improving performance. Empirical studies on five spatio-temporal benchmark datasets show SVQ achieves state-of-the-art results, including a 7.9% improvement on the WeatherBench-S temperature dataset and an average mean absolute error reduction of 9.4% in video prediction benchmarks (Human3.6M, KTH, and KittiCaltech), along with a 17.3% enhancement in image quality (LPIPS). Code is publicly available at https://github.com/Pachark/SVQ-Forecasting.
