Table of Contents
Fetching ...

TimeMixer++: A General Time Series Pattern Machine for Universal Predictive Analysis

Shiyu Wang, Jiawei Li, Xiaoming Shi, Zhou Ye, Baichuan Mo, Wenze Lin, Shengtong Ju, Zhixuan Chu, Ming Jin

TL;DR

TimeMixer++ presents a universal time series pattern machine that jointly models multi-scale and multi-periodic dynamics through multi-resolution time imaging (MRTI) and time image decomposition (TID) with dual-axis attention, followed by hierarchical multi-scale and multi-resolution mixing (MCM/MRM). The encoder-only architecture processes multi-scale inputs, converts them into time images, disentangles seasonality and trend, and ensembles predictions across scales, delivering state-of-the-art results across 8 tasks and 30 benchmarks. Extensive experiments demonstrate strong generalization in forecasting, imputation, few-shot/zero-shot learning, and detection/classification, supported by ablation and representation analyses. The approach shows broad practical potential for energy, weather, finance, and industrial monitoring, and lays a foundation for scaling-time-series pattern machines with future large-scale data resources.

Abstract

Time series analysis plays a critical role in numerous applications, supporting tasks such as forecasting, classification, anomaly detection, and imputation. In this work, we present the time series pattern machine (TSPM), a model designed to excel in a broad range of time series tasks through powerful representation and pattern extraction capabilities. Traditional time series models often struggle to capture universal patterns, limiting their effectiveness across diverse tasks. To address this, we define multiple scales in the time domain and various resolutions in the frequency domain, employing various mixing strategies to extract intricate, task-adaptive time series patterns. Specifically, we introduce a general-purpose TSPM that processes multi-scale time series using (1) multi-resolution time imaging (MRTI), (2) time image decomposition (TID), (3) multi-scale mixing (MCM), and (4) multi-resolution mixing (MRM) to extract comprehensive temporal patterns. MRTI transforms multi-scale time series into multi-resolution time images, capturing patterns across both temporal and frequency domains. TID leverages dual-axis attention to extract seasonal and trend patterns, while MCM hierarchically aggregates these patterns across scales. MRM adaptively integrates all representations across resolutions. This method achieves state-of-the-art performance across 8 time series analytical tasks, consistently surpassing both general-purpose and task-specific models. Our work marks a promising step toward the next generation of TSPMs, paving the way for further advancements in time series analysis.

TimeMixer++: A General Time Series Pattern Machine for Universal Predictive Analysis

TL;DR

TimeMixer++ presents a universal time series pattern machine that jointly models multi-scale and multi-periodic dynamics through multi-resolution time imaging (MRTI) and time image decomposition (TID) with dual-axis attention, followed by hierarchical multi-scale and multi-resolution mixing (MCM/MRM). The encoder-only architecture processes multi-scale inputs, converts them into time images, disentangles seasonality and trend, and ensembles predictions across scales, delivering state-of-the-art results across 8 tasks and 30 benchmarks. Extensive experiments demonstrate strong generalization in forecasting, imputation, few-shot/zero-shot learning, and detection/classification, supported by ablation and representation analyses. The approach shows broad practical potential for energy, weather, finance, and industrial monitoring, and lays a foundation for scaling-time-series pattern machines with future large-scale data resources.

Abstract

Time series analysis plays a critical role in numerous applications, supporting tasks such as forecasting, classification, anomaly detection, and imputation. In this work, we present the time series pattern machine (TSPM), a model designed to excel in a broad range of time series tasks through powerful representation and pattern extraction capabilities. Traditional time series models often struggle to capture universal patterns, limiting their effectiveness across diverse tasks. To address this, we define multiple scales in the time domain and various resolutions in the frequency domain, employing various mixing strategies to extract intricate, task-adaptive time series patterns. Specifically, we introduce a general-purpose TSPM that processes multi-scale time series using (1) multi-resolution time imaging (MRTI), (2) time image decomposition (TID), (3) multi-scale mixing (MCM), and (4) multi-resolution mixing (MRM) to extract comprehensive temporal patterns. MRTI transforms multi-scale time series into multi-resolution time images, capturing patterns across both temporal and frequency domains. TID leverages dual-axis attention to extract seasonal and trend patterns, while MCM hierarchically aggregates these patterns across scales. MRM adaptively integrates all representations across resolutions. This method achieves state-of-the-art performance across 8 time series analytical tasks, consistently surpassing both general-purpose and task-specific models. Our work marks a promising step toward the next generation of TSPMs, paving the way for further advancements in time series analysis.

Paper Structure

This paper contains 41 sections, 12 equations, 25 figures, 20 tables.

Figures (25)

  • Figure 1: Benchmarking model performance across eight tasks (left) and representation analysis in four tasks (right). For each model on the right, the centered kernel alignment (CKA) similarity kornblith2019similarity is computed between the representations from the first and last layers.
  • Figure 2: The framework of TimeMixer++. The multi-scale time series is first embedded through an input projection layer, followed by $L$ stacked MixerBlocks. Each block converts the multi-scale input into multi-resolution time images, disentangles seasonality and trend via dual-axis attention, and mixes these patterns using multi-scale and multi-resolution mixing.
  • Figure 3: Results of classification and anomaly detection. The results are averaged from several datasets. Higher accuracy and F1 score indicate better performance. $\ast$. indicates the Transformer-based models. See Table \ref{['tab:full_anomaly_results']} and \ref{['tab:full_classification_results']} in the Appendix \ref{['appdix:full_search']} for full results.
  • Figure 4: Visualization of representation on Time Image under Traffic dataset. More showcases in Figure \ref{['fig:representation_traffic']}, \ref{['fig:representation_ettm1']}, \ref{['fig:representation_ecl']}.
  • Figure 5: Illustration of the channel mixing approach and embedding function in the input projection process. This process highlights how variate-wise self-attention captures inter-variable dependencies at the coarsest scale, followed by the projection into an embedding space.
  • ...and 20 more figures