Table of Contents
Fetching ...

SimMTM: A Simple Pre-Training Framework for Masked Time-Series Modeling

Jiaxiang Dong, Haixu Wu, Haoran Zhang, Li Zhang, Jianmin Wang, Mingsheng Long

TL;DR

This work tackles the challenge of pre-training for time series by revealing that direct reconstruction from masked points can erase vital temporal variations. It introduces SimMTM, a manifold-aware masked modeling framework that reconstructs original series from multiple masked neighbors via a weighted, neighborhood-guided aggregation of point-wise representations. A neighborhood constraint further aligns series-wise representations with the local manifold structure, enabling robust transfer to forecasting and classification tasks across in-domain and cross-domain settings. Empirically, SimMTM consistently achieves state-of-the-art fine-tuning performance across a broad set of real-world datasets and demonstrates strong generalization to limited data scenarios and diverse base models, highlighting its potential as a foundation-model-style approach for time series analysis.

Abstract

Time series analysis is widely used in extensive areas. Recently, to reduce labeling expenses and benefit various tasks, self-supervised pre-training has attracted immense interest. One mainstream paradigm is masked modeling, which successfully pre-trains deep models by learning to reconstruct the masked content based on the unmasked part. However, since the semantic information of time series is mainly contained in temporal variations, the standard way of randomly masking a portion of time points will seriously ruin vital temporal variations of time series, making the reconstruction task too difficult to guide representation learning. We thus present SimMTM, a Simple pre-training framework for Masked Time-series Modeling. By relating masked modeling to manifold learning, SimMTM proposes to recover masked time points by the weighted aggregation of multiple neighbors outside the manifold, which eases the reconstruction task by assembling ruined but complementary temporal variations from multiple masked series. SimMTM further learns to uncover the local structure of the manifold, which is helpful for masked modeling. Experimentally, SimMTM achieves state-of-the-art fine-tuning performance compared to the most advanced time series pre-training methods in two canonical time series analysis tasks: forecasting and classification, covering both in- and cross-domain settings.

SimMTM: A Simple Pre-Training Framework for Masked Time-Series Modeling

TL;DR

This work tackles the challenge of pre-training for time series by revealing that direct reconstruction from masked points can erase vital temporal variations. It introduces SimMTM, a manifold-aware masked modeling framework that reconstructs original series from multiple masked neighbors via a weighted, neighborhood-guided aggregation of point-wise representations. A neighborhood constraint further aligns series-wise representations with the local manifold structure, enabling robust transfer to forecasting and classification tasks across in-domain and cross-domain settings. Empirically, SimMTM consistently achieves state-of-the-art fine-tuning performance across a broad set of real-world datasets and demonstrates strong generalization to limited data scenarios and diverse base models, highlighting its potential as a foundation-model-style approach for time series analysis.

Abstract

Time series analysis is widely used in extensive areas. Recently, to reduce labeling expenses and benefit various tasks, self-supervised pre-training has attracted immense interest. One mainstream paradigm is masked modeling, which successfully pre-trains deep models by learning to reconstruct the masked content based on the unmasked part. However, since the semantic information of time series is mainly contained in temporal variations, the standard way of randomly masking a portion of time points will seriously ruin vital temporal variations of time series, making the reconstruction task too difficult to guide representation learning. We thus present SimMTM, a Simple pre-training framework for Masked Time-series Modeling. By relating masked modeling to manifold learning, SimMTM proposes to recover masked time points by the weighted aggregation of multiple neighbors outside the manifold, which eases the reconstruction task by assembling ruined but complementary temporal variations from multiple masked series. SimMTM further learns to uncover the local structure of the manifold, which is helpful for masked modeling. Experimentally, SimMTM achieves state-of-the-art fine-tuning performance compared to the most advanced time series pre-training methods in two canonical time series analysis tasks: forecasting and classification, covering both in- and cross-domain settings.
Paper Structure (42 sections, 9 equations, 6 figures, 25 tables)

This paper contains 42 sections, 9 equations, 6 figures, 25 tables.

Figures (6)

  • Figure 1: Comparison between (a) canonical masked modeling and (b) SimMTM in both manifold perspective and reconstruction performance. The showcase is to recover 50% masked time series.
  • Figure 2: Architecture of SimMTM, which reconstructs the original time series by adaptive aggregating multiple masked time series based on series-wise similarities learned contrastively from data.
  • Figure 3: Performance comparison of time series pre-training methods in forecasting (MSE$\downarrow$) and classification (Acc$\uparrow$) tasks, including both in-domain (left) and cross-domain (right) settings.
  • Figure 4: Ablations of SimMTM on the reconstruction loss (${\cal L}_{\text{rec.}}$) and constraint loss (${\cal L}_{\text{con.}}$) in time series forecasting (left part) and classification (right part) tasks under both in-domain and cross-domain settings. More ablations are included in Appendix \ref{['app:fullresults']}.
  • Figure 5: Model analysis. Left part is for fine-tuning ETTh2 pre-trained model to ETTh1 with limited data, where a smaller MSE indicates better performance. Right part presents the MSE performance of SimMTM in the ETTh1 "input-336-predict-96" in-domain setting with different masked ratio $r$ and numbers of masked series $M$, where a darker red means better performance.
  • ...and 1 more figures