Table of Contents
Fetching ...

SWIFT: Mapping Sub-series with Wavelet Decomposition Improves Time Series Forecasting

Wenxuan Xie, Fanpu Cao

TL;DR

SWIFT introduces a lightweight, edge-friendly long-term time-series forecasting model based on a first-order DWT with Haar basis. By decomposing inputs into low- and high-frequency components, fusing them through a learnable filter, and applying a single shared linear/MLP mapping before reconstructing with IDWT, SWIFT achieves competitive or state-of-the-art accuracy with orders-of-magnitude fewer parameters. The approach is reinforced by ablations that confirm the benefits of DWT, channel-independence, and a shared mapping, with Haar wavelets delivering the best trade-off between accuracy and efficiency. The work demonstrates strong potential for real-time deployment on resource-constrained devices and suggests future extensions to multi-resolution wavelets and anomaly detection tasks.

Abstract

In recent work on time-series prediction, Transformers and even large language models have garnered significant attention due to their strong capabilities in sequence modeling. However, in practical deployments, time-series prediction often requires operation in resource-constrained environments, such as edge devices, which are unable to handle the computational overhead of large models. To address such scenarios, some lightweight models have been proposed, but they exhibit poor performance on non-stationary sequences. In this paper, we propose $\textit{SWIFT}$, a lightweight model that is not only powerful, but also efficient in deployment and inference for Long-term Time Series Forecasting (LTSF). Our model is based on three key points: (i) Utilizing wavelet transform to perform lossless downsampling of time series. (ii) Achieving cross-band information fusion with a learnable filter. (iii) Using only one shared linear layer or one shallow MLP for sub-series' mapping. We conduct comprehensive experiments, and the results show that $\textit{SWIFT}$ achieves state-of-the-art (SOTA) performance on multiple datasets, offering a promising method for edge computing and deployment in this task. Moreover, it is noteworthy that the number of parameters in $\textit{SWIFT-Linear}$ is only 25\% of what it would be with a single-layer linear model for time-domain prediction. Our code is available at https://github.com/LancelotXWX/SWIFT.

SWIFT: Mapping Sub-series with Wavelet Decomposition Improves Time Series Forecasting

TL;DR

SWIFT introduces a lightweight, edge-friendly long-term time-series forecasting model based on a first-order DWT with Haar basis. By decomposing inputs into low- and high-frequency components, fusing them through a learnable filter, and applying a single shared linear/MLP mapping before reconstructing with IDWT, SWIFT achieves competitive or state-of-the-art accuracy with orders-of-magnitude fewer parameters. The approach is reinforced by ablations that confirm the benefits of DWT, channel-independence, and a shared mapping, with Haar wavelets delivering the best trade-off between accuracy and efficiency. The work demonstrates strong potential for real-time deployment on resource-constrained devices and suggests future extensions to multi-resolution wavelets and anomaly detection tasks.

Abstract

In recent work on time-series prediction, Transformers and even large language models have garnered significant attention due to their strong capabilities in sequence modeling. However, in practical deployments, time-series prediction often requires operation in resource-constrained environments, such as edge devices, which are unable to handle the computational overhead of large models. To address such scenarios, some lightweight models have been proposed, but they exhibit poor performance on non-stationary sequences. In this paper, we propose , a lightweight model that is not only powerful, but also efficient in deployment and inference for Long-term Time Series Forecasting (LTSF). Our model is based on three key points: (i) Utilizing wavelet transform to perform lossless downsampling of time series. (ii) Achieving cross-band information fusion with a learnable filter. (iii) Using only one shared linear layer or one shallow MLP for sub-series' mapping. We conduct comprehensive experiments, and the results show that achieves state-of-the-art (SOTA) performance on multiple datasets, offering a promising method for edge computing and deployment in this task. Moreover, it is noteworthy that the number of parameters in is only 25\% of what it would be with a single-layer linear model for time-domain prediction. Our code is available at https://github.com/LancelotXWX/SWIFT.

Paper Structure

This paper contains 40 sections, 1 theorem, 19 equations, 5 figures, 10 tables.

Key Result

Proposition 1.1

Let $\mathcal{S}$ be a dynamical system governing an observable variable $y_t$ and a latent state $x_t \in \mathbb{R}^d$. Under the assumption of fading memory, the future observation $y_{t+1}$ can be approximated by a function of a finite history window $y_{t:t-k}$. Specifically, the approximation

Figures (5)

  • Figure 1: MSE performance on a simple synthetic non-stationary signal, with forecasting starting at the 96th time step.
  • Figure 2: Overall structure of SWIFT. (i)The DWT module decomposes the time series into two sub-series: the approximation coefficient and the detail coefficient, based on the Haar wavelet; (ii) The convolutional layer is applied for filtering and feature aggregation. (iii)Linear or MLP is used for the mapping of sub-series to make prediction. T denote the length of the look-back window, N is the number of variables (i.e., channels), and S refers to the length of the prediction horizon.
  • Figure 3: Performing convolution in the wavelet domain ($\ell=1$) results in a larger receptive field. In this example, a convolution is able to have a receptive field of 4 with a kernel size of 2.
  • Figure 4: Visualization results of weight maps trained on the ECL dataset. From left to right are $W_s$, $W_l$ and $W_h$.
  • Figure 5: Visualization of DWT decomposition on the Traffic.

Theorems & Definitions (2)

  • Proposition 1.1
  • proof