Table of Contents
Fetching ...

TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting

Shiyu Wang, Haixu Wu, Xiaoming Shi, Tengge Hu, Huakun Luo, Lintao Ma, James Y. Zhang, Jun Zhou

TL;DR

TimeMixer introduces a fully MLP-based forecasting model that exploits a novel multiscale mixing paradigm. By disentangling past information into seasonal and trend components with Past-Decomposable-Mixing and by ensembling multiscale future predictions with Future-Multipredictor-Mixing, it achieves state-of-the-art results across both long-term and short-term tasks while remaining efficient. The approach is extensively validated on diverse real-world benchmarks, supported by thorough ablations and visualizations that illuminate the distinct roles of seasonal and trend channels and the benefits of scale-wise prediction. The work offers a practical, scalable solution for complex, non-stationary time-series forecasting and suggests future directions for integrating alternative mixing schemes and cross-dimension interactions.

Abstract

Time series forecasting is widely used in extensive applications, such as traffic planning and weather forecasting. However, real-world time series usually present intricate temporal variations, making forecasting extremely challenging. Going beyond the mainstream paradigms of plain decomposition and multiperiodicity analysis, we analyze temporal variations in a novel view of multiscale-mixing, which is based on an intuitive but important observation that time series present distinct patterns in different sampling scales. The microscopic and the macroscopic information are reflected in fine and coarse scales respectively, and thereby complex variations can be inherently disentangled. Based on this observation, we propose TimeMixer as a fully MLP-based architecture with Past-Decomposable-Mixing (PDM) and Future-Multipredictor-Mixing (FMM) blocks to take full advantage of disentangled multiscale series in both past extraction and future prediction phases. Concretely, PDM applies the decomposition to multiscale series and further mixes the decomposed seasonal and trend components in fine-to-coarse and coarse-to-fine directions separately, which successively aggregates the microscopic seasonal and macroscopic trend information. FMM further ensembles multiple predictors to utilize complementary forecasting capabilities in multiscale observations. Consequently, TimeMixer is able to achieve consistent state-of-the-art performances in both long-term and short-term forecasting tasks with favorable run-time efficiency.

TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting

TL;DR

TimeMixer introduces a fully MLP-based forecasting model that exploits a novel multiscale mixing paradigm. By disentangling past information into seasonal and trend components with Past-Decomposable-Mixing and by ensembling multiscale future predictions with Future-Multipredictor-Mixing, it achieves state-of-the-art results across both long-term and short-term tasks while remaining efficient. The approach is extensively validated on diverse real-world benchmarks, supported by thorough ablations and visualizations that illuminate the distinct roles of seasonal and trend channels and the benefits of scale-wise prediction. The work offers a practical, scalable solution for complex, non-stationary time-series forecasting and suggests future directions for integrating alternative mixing schemes and cross-dimension interactions.

Abstract

Time series forecasting is widely used in extensive applications, such as traffic planning and weather forecasting. However, real-world time series usually present intricate temporal variations, making forecasting extremely challenging. Going beyond the mainstream paradigms of plain decomposition and multiperiodicity analysis, we analyze temporal variations in a novel view of multiscale-mixing, which is based on an intuitive but important observation that time series present distinct patterns in different sampling scales. The microscopic and the macroscopic information are reflected in fine and coarse scales respectively, and thereby complex variations can be inherently disentangled. Based on this observation, we propose TimeMixer as a fully MLP-based architecture with Past-Decomposable-Mixing (PDM) and Future-Multipredictor-Mixing (FMM) blocks to take full advantage of disentangled multiscale series in both past extraction and future prediction phases. Concretely, PDM applies the decomposition to multiscale series and further mixes the decomposed seasonal and trend components in fine-to-coarse and coarse-to-fine directions separately, which successively aggregates the microscopic seasonal and macroscopic trend information. FMM further ensembles multiple predictors to utilize complementary forecasting capabilities in multiscale observations. Consequently, TimeMixer is able to achieve consistent state-of-the-art performances in both long-term and short-term forecasting tasks with favorable run-time efficiency.
Paper Structure (48 sections, 7 equations, 18 figures, 25 tables)

This paper contains 48 sections, 7 equations, 18 figures, 25 tables.

Figures (18)

  • Figure 1: Overall architecture of TimeMixer, which consists of Past-Decomposable Mixing and Future-Multipredictor-Mixing for past observations and future predictions respectively.
  • Figure 2: The temporal linear layer in seasonal mixing (a), trend mixing (b) and future prediction (c).
  • Figure 3: Visualization of temporal linear weights in seasonal mixing (Eq. \ref{['equ:season_mxiing']}), trend mixing (Eq. \ref{['equ:trend_mixing']}), and predictions from multiscale season-trend items. All the experiments are on the ETTh1 dataset under the input-96-predict-96 setting.
  • Figure 4: Visualization of predictions from different scales ($\widehat{\mathbf{x}}_{m}^L$ in Eq. \ref{['equ:fmm']}) on the input-96-predict-96 settings of the ETTh1 dataset. The implementation details are included in Appendix \ref{['sec:detail']}.
  • Figure 5: Efficiency analysis in both GPU memory and running time. The results are recorded on the ETTh1 dataset with batch size as 16. The running time is averaged from $10^2$ iterations.
  • ...and 13 more figures