Table of Contents
Fetching ...

A Multi-Scale Decomposition MLP-Mixer for Time Series Analysis

Shuhan Zhong, Sizhe Song, Weipeng Zhuo, Guanyao Li, Yang Liu, S. -H. Gary Chan

TL;DR

This work tackles time series analysis by addressing the need for explicit decomposition of multi-scale temporal patterns and inter-channel dependencies. It introduces MSD-Mixer, a task-general backbone built on MLP-Mixer with a novel multi-scale temporal patching scheme and a residual loss that enforces decomposition completeness. By decomposing X into layered components S_i across k layers and modeling them via Patch Encoder/Decoder modules, the approach achieves superior performance across long-term and short-term forecasting, imputation, anomaly detection, and classification, while remaining efficient. The results demonstrate that combining decomposition with multi-scale sub-series modeling yields significant gains, suggesting broad practical impact for real-world time series applications.

Abstract

Time series data, including univariate and multivariate ones, are characterized by unique composition and complex multi-scale temporal variations. They often require special consideration of decomposition and multi-scale modeling to analyze. Existing deep learning methods on this best fit to univariate time series only, and have not sufficiently considered sub-series modeling and decomposition completeness. To address these challenges, we propose MSD-Mixer, a Multi-Scale Decomposition MLP-Mixer, which learns to explicitly decompose and represent the input time series in its different layers. To handle the multi-scale temporal patterns and multivariate dependencies, we propose a novel temporal patching approach to model the time series as multi-scale patches, and employ MLPs to capture intra- and inter-patch variations and channel-wise correlations. In addition, we propose a novel loss function to constrain both the mean and the autocorrelation of the decomposition residual for better decomposition completeness. Through extensive experiments on various real-world datasets for five common time series analysis tasks, we demonstrate that MSD-Mixer consistently and significantly outperforms other state-of-the-art algorithms with better efficiency.

A Multi-Scale Decomposition MLP-Mixer for Time Series Analysis

TL;DR

This work tackles time series analysis by addressing the need for explicit decomposition of multi-scale temporal patterns and inter-channel dependencies. It introduces MSD-Mixer, a task-general backbone built on MLP-Mixer with a novel multi-scale temporal patching scheme and a residual loss that enforces decomposition completeness. By decomposing X into layered components S_i across k layers and modeling them via Patch Encoder/Decoder modules, the approach achieves superior performance across long-term and short-term forecasting, imputation, anomaly detection, and classification, while remaining efficient. The results demonstrate that combining decomposition with multi-scale sub-series modeling yields significant gains, suggesting broad practical impact for real-world time series applications.

Abstract

Time series data, including univariate and multivariate ones, are characterized by unique composition and complex multi-scale temporal variations. They often require special consideration of decomposition and multi-scale modeling to analyze. Existing deep learning methods on this best fit to univariate time series only, and have not sufficiently considered sub-series modeling and decomposition completeness. To address these challenges, we propose MSD-Mixer, a Multi-Scale Decomposition MLP-Mixer, which learns to explicitly decompose and represent the input time series in its different layers. To handle the multi-scale temporal patterns and multivariate dependencies, we propose a novel temporal patching approach to model the time series as multi-scale patches, and employ MLPs to capture intra- and inter-patch variations and channel-wise correlations. In addition, we propose a novel loss function to constrain both the mean and the autocorrelation of the decomposition residual for better decomposition completeness. Through extensive experiments on various real-world datasets for five common time series analysis tasks, we demonstrate that MSD-Mixer consistently and significantly outperforms other state-of-the-art algorithms with better efficiency.
Paper Structure (38 sections, 8 equations, 6 figures, 12 tables, 1 algorithm)

This paper contains 38 sections, 8 equations, 6 figures, 12 tables, 1 algorithm.

Figures (6)

  • Figure 1: (a) Decomposition of time series. (b) Comparison of single time points and multi-scale sub-series.
  • Figure 2: MSD-Mixer overview.
  • Figure 3: Examples of multi-scale temporal patching. The channel dimension is omitted for simplicity.
  • Figure 4: (a) MLP block. (b) Patch Encoder. (c) Patch Decoder.
  • Figure 5: Examples of decomposition.
  • ...and 1 more figures