TwinS: Revisiting Non-Stationarity in Multivariate Time Series Forecasting
Jiaxi Hu, Qingsong Wen, Sijie Ruan, Li Liu, Yuxuan Liang
TL;DR
The paper addresses non-stationarity in multivariate time series forecasting by introducing TwinS, a Transformer-based model with three modules: Wavelet Convolution for multi-scale embedding of nested periods, Periodic Aware Attention that uses a convolutional sub-network to generate period-sensitive attention scores, and a Channel-Temporal Mixer MLP to capture inter-series relationships including hysteresis. Empirically, TwinS achieves state-of-the-art performance on five real-world datasets and demonstrates substantial improvements, including up to $25.8\%$ MSE reduction over PatchTST, with ablations validating the contribution of each module. The work emphasizes modeling non-stationary periodic distributions rather than forcing stationarity, offering a practical and scalable approach for long-horizon MTSF tasks. These findings have significant implications for real-world forecasting where nested, shifting, and interdependent periodic patterns are common.
Abstract
Recently, multivariate time series forecasting tasks have garnered increasing attention due to their significant practical applications, leading to the emergence of various deep forecasting models. However, real-world time series exhibit pronounced non-stationary distribution characteristics. These characteristics are not solely limited to time-varying statistical properties highlighted by non-stationary Transformer but also encompass three key aspects: nested periodicity, absence of periodic distributions, and hysteresis among time variables. In this paper, we begin by validating this theory through wavelet analysis and propose the Transformer-based TwinS model, which consists of three modules to address the non-stationary periodic distributions: Wavelet Convolution, Period-Aware Attention, and Channel-Temporal Mixed MLP. Specifically, The Wavelet Convolution models nested periods by scaling the convolution kernel size like wavelet transform. The Period-Aware Attention guides attention computation by generating period relevance scores through a convolutional sub-network. The Channel-Temporal Mixed MLP captures the overall relationships between time series through channel-time mixing learning. TwinS achieves SOTA performance compared to mainstream TS models, with a maximum improvement in MSE of 25.8\% over PatchTST.
