SWIFT: Mapping Sub-series with Wavelet Decomposition Improves Time Series Forecasting
Wenxuan Xie, Fanpu Cao
TL;DR
SWIFT introduces a lightweight, edge-friendly long-term time-series forecasting model based on a first-order DWT with Haar basis. By decomposing inputs into low- and high-frequency components, fusing them through a learnable filter, and applying a single shared linear/MLP mapping before reconstructing with IDWT, SWIFT achieves competitive or state-of-the-art accuracy with orders-of-magnitude fewer parameters. The approach is reinforced by ablations that confirm the benefits of DWT, channel-independence, and a shared mapping, with Haar wavelets delivering the best trade-off between accuracy and efficiency. The work demonstrates strong potential for real-time deployment on resource-constrained devices and suggests future extensions to multi-resolution wavelets and anomaly detection tasks.
Abstract
In recent work on time-series prediction, Transformers and even large language models have garnered significant attention due to their strong capabilities in sequence modeling. However, in practical deployments, time-series prediction often requires operation in resource-constrained environments, such as edge devices, which are unable to handle the computational overhead of large models. To address such scenarios, some lightweight models have been proposed, but they exhibit poor performance on non-stationary sequences. In this paper, we propose $\textit{SWIFT}$, a lightweight model that is not only powerful, but also efficient in deployment and inference for Long-term Time Series Forecasting (LTSF). Our model is based on three key points: (i) Utilizing wavelet transform to perform lossless downsampling of time series. (ii) Achieving cross-band information fusion with a learnable filter. (iii) Using only one shared linear layer or one shallow MLP for sub-series' mapping. We conduct comprehensive experiments, and the results show that $\textit{SWIFT}$ achieves state-of-the-art (SOTA) performance on multiple datasets, offering a promising method for edge computing and deployment in this task. Moreover, it is noteworthy that the number of parameters in $\textit{SWIFT-Linear}$ is only 25\% of what it would be with a single-layer linear model for time-domain prediction. Our code is available at https://github.com/LancelotXWX/SWIFT.
