TwinS: Revisiting Non-Stationarity in Multivariate Time Series Forecasting

Jiaxi Hu; Qingsong Wen; Sijie Ruan; Li Liu; Yuxuan Liang

TwinS: Revisiting Non-Stationarity in Multivariate Time Series Forecasting

Jiaxi Hu, Qingsong Wen, Sijie Ruan, Li Liu, Yuxuan Liang

TL;DR

The paper addresses non-stationarity in multivariate time series forecasting by introducing TwinS, a Transformer-based model with three modules: Wavelet Convolution for multi-scale embedding of nested periods, Periodic Aware Attention that uses a convolutional sub-network to generate period-sensitive attention scores, and a Channel-Temporal Mixer MLP to capture inter-series relationships including hysteresis. Empirically, TwinS achieves state-of-the-art performance on five real-world datasets and demonstrates substantial improvements, including up to $25.8\%$ MSE reduction over PatchTST, with ablations validating the contribution of each module. The work emphasizes modeling non-stationary periodic distributions rather than forcing stationarity, offering a practical and scalable approach for long-horizon MTSF tasks. These findings have significant implications for real-world forecasting where nested, shifting, and interdependent periodic patterns are common.

Abstract

Recently, multivariate time series forecasting tasks have garnered increasing attention due to their significant practical applications, leading to the emergence of various deep forecasting models. However, real-world time series exhibit pronounced non-stationary distribution characteristics. These characteristics are not solely limited to time-varying statistical properties highlighted by non-stationary Transformer but also encompass three key aspects: nested periodicity, absence of periodic distributions, and hysteresis among time variables. In this paper, we begin by validating this theory through wavelet analysis and propose the Transformer-based TwinS model, which consists of three modules to address the non-stationary periodic distributions: Wavelet Convolution, Period-Aware Attention, and Channel-Temporal Mixed MLP. Specifically, The Wavelet Convolution models nested periods by scaling the convolution kernel size like wavelet transform. The Period-Aware Attention guides attention computation by generating period relevance scores through a convolutional sub-network. The Channel-Temporal Mixed MLP captures the overall relationships between time series through channel-time mixing learning. TwinS achieves SOTA performance compared to mainstream TS models, with a maximum improvement in MSE of 25.8\% over PatchTST.

TwinS: Revisiting Non-Stationarity in Multivariate Time Series Forecasting

TL;DR

MSE reduction over PatchTST, with ablations validating the contribution of each module. The work emphasizes modeling non-stationary periodic distributions rather than forcing stationarity, offering a practical and scalable approach for long-horizon MTSF tasks. These findings have significant implications for real-world forecasting where nested, shifting, and interdependent periodic patterns are common.

Abstract

Paper Structure (18 sections, 16 equations, 7 figures, 4 tables)

This paper contains 18 sections, 16 equations, 7 figures, 4 tables.

INTRODUCTION
RELATED WORK
Methodology of the TwinS
Wavelet Convolution Embedding
Periodic Modeling
Channel-Temporal Mixer MLP
Experiments
Main Result
Ablition Study
Further Analysis
Conclusion
Implement Details
Datasets
Baselines
Experiments Setting
...and 3 more sections

Figures (7)

Figure 1: Examples of the wv and wd variables in the Weather dataset. The x-axis is the shared time steps. The graph above shows the variable values over time. The one below is the wavelet level plot, representing the strength of the signal's energy at different time-frequency scales.
Figure 2: Overall architecture of TwinS with detailed structure of Periodic Aware Attention and Encoder-Layer.
Figure 3: Left: Architecture of Wavelet Convolution. Right: Frequency component figure with multiple channel Conv to aware missing periodic information.
Figure 4: Further analysis. (a) Visualization for attention matrices in PatchTST. (b) Visualization for attention matrices in TwinS. (c) MSE against the patch length on Weather dataset. (d) MSE against hidden dim of CT-MLP on ETTm2 dataset. We normalize (cd) to highlight the differences using the optimal result.
Figure 5: Examples of FFT-image
...and 2 more figures

TwinS: Revisiting Non-Stationarity in Multivariate Time Series Forecasting

TL;DR

Abstract

TwinS: Revisiting Non-Stationarity in Multivariate Time Series Forecasting

Authors

TL;DR

Abstract

Table of Contents

Figures (7)