A Multi-Scale Decomposition MLP-Mixer for Time Series Analysis

Shuhan Zhong; Sizhe Song; Weipeng Zhuo; Guanyao Li; Yang Liu; S. -H. Gary Chan

A Multi-Scale Decomposition MLP-Mixer for Time Series Analysis

Shuhan Zhong, Sizhe Song, Weipeng Zhuo, Guanyao Li, Yang Liu, S. -H. Gary Chan

TL;DR

This work tackles time series analysis by addressing the need for explicit decomposition of multi-scale temporal patterns and inter-channel dependencies. It introduces MSD-Mixer, a task-general backbone built on MLP-Mixer with a novel multi-scale temporal patching scheme and a residual loss that enforces decomposition completeness. By decomposing X into layered components S_i across k layers and modeling them via Patch Encoder/Decoder modules, the approach achieves superior performance across long-term and short-term forecasting, imputation, anomaly detection, and classification, while remaining efficient. The results demonstrate that combining decomposition with multi-scale sub-series modeling yields significant gains, suggesting broad practical impact for real-world time series applications.

Abstract

Time series data, including univariate and multivariate ones, are characterized by unique composition and complex multi-scale temporal variations. They often require special consideration of decomposition and multi-scale modeling to analyze. Existing deep learning methods on this best fit to univariate time series only, and have not sufficiently considered sub-series modeling and decomposition completeness. To address these challenges, we propose MSD-Mixer, a Multi-Scale Decomposition MLP-Mixer, which learns to explicitly decompose and represent the input time series in its different layers. To handle the multi-scale temporal patterns and multivariate dependencies, we propose a novel temporal patching approach to model the time series as multi-scale patches, and employ MLPs to capture intra- and inter-patch variations and channel-wise correlations. In addition, we propose a novel loss function to constrain both the mean and the autocorrelation of the decomposition residual for better decomposition completeness. Through extensive experiments on various real-world datasets for five common time series analysis tasks, we demonstrate that MSD-Mixer consistently and significantly outperforms other state-of-the-art algorithms with better efficiency.

A Multi-Scale Decomposition MLP-Mixer for Time Series Analysis

TL;DR

Abstract

Paper Structure (38 sections, 8 equations, 6 figures, 12 tables, 1 algorithm)

This paper contains 38 sections, 8 equations, 6 figures, 12 tables, 1 algorithm.

Introduction
Related Works
Classical Methods
Deep Models without Decomposition
Deep Models with Decomposition
MSD-Mixer
Problem Settings
Time Series Analysis
Time Series Analysis with Decomposition
MSD-Mixer Overview
Multi-Scale Temporal Patching
Patch Encoder and Decoder
Residual Loss
Summary
Illustrative Experimental Results
...and 23 more sections

Figures (6)

Figure 1: (a) Decomposition of time series. (b) Comparison of single time points and multi-scale sub-series.
Figure 2: MSD-Mixer overview.
Figure 3: Examples of multi-scale temporal patching. The channel dimension is omitted for simplicity.
Figure 4: (a) MLP block. (b) Patch Encoder. (c) Patch Decoder.
Figure 5: Examples of decomposition.
...and 1 more figures

A Multi-Scale Decomposition MLP-Mixer for Time Series Analysis

TL;DR

Abstract

A Multi-Scale Decomposition MLP-Mixer for Time Series Analysis

Authors

TL;DR

Abstract

Table of Contents

Figures (6)