Unlocking the Power of Patch: Patch-Based MLP for Long-Term Time Series Forecasting
Peiwang Tang, Weitai Zhang
TL;DR
The paper questions the supremacy of Transformer models for long-term time series forecasting, arguing that patch-based input and cross-variable interactions largely drive performance. It introduces PatchMLP, a concise, fully-MLP architecture using Multi-Scale Patch Embedding, moving-average based Feature Decomposition, and intra-/inter-variable MLPs with a dot-product coupling to enable information exchange across variables. Empirical evaluations on eight real-world datasets show PatchMLP achieving state-of-the-art performance on all 16 benchmarks, often outperforming Transformer baselines. The results underscore the importance of cross-variable interactions and patch-based representations for efficient, accurate LTSF, suggesting a shift toward simpler, more interpretable models that emphasize locality and inter-variable synergy.
Abstract
Recent studies have attempted to refine the Transformer architecture to demonstrate its effectiveness in Long-Term Time Series Forecasting (LTSF) tasks. Despite surpassing many linear forecasting models with ever-improving performance, we remain skeptical of Transformers as a solution for LTSF. We attribute the effectiveness of these models largely to the adopted Patch mechanism, which enhances sequence locality to an extent yet fails to fully address the loss of temporal information inherent to the permutation-invariant self-attention mechanism. Further investigation suggests that simple linear layers augmented with the Patch mechanism may outperform complex Transformer-based LTSF models. Moreover, diverging from models that use channel independence, our research underscores the importance of cross-variable interactions in enhancing the performance of multivariate time series forecasting. The interaction information between variables is highly valuable but has been misapplied in past studies, leading to suboptimal cross-variable models. Based on these insights, we propose a novel and simple Patch-based MLP (PatchMLP) for LTSF tasks. Specifically, we employ simple moving averages to extract smooth components and noise-containing residuals from time series data, engaging in semantic information interchange through channel mixing and specializing in random noise with channel independence processing. The PatchMLP model consistently achieves state-of-the-art results on several real-world datasets. We hope this surprising finding will spur new research directions in the LTSF field and pave the way for more efficient and concise solutions.
