Table of Contents
Fetching ...

A Mamba Foundation Model for Time Series Forecasting

Haoyu Ma, Yushu Chen, Wenlai Zhao, Jinzhe Yang, Yingsheng Ji, Xinghua Xu, Xiaozhu Liu, Hao Jing, Shengzhuo Liu, Guangwen Yang

TL;DR

TSMamba is introduced, a linear-complexity foundation model for time series forecasting built on the Mamba architecture that achieves competitive or superior full-shot performance compared to task-specific prediction models.

Abstract

Time series foundation models have demonstrated strong performance in zero-shot learning, making them well-suited for predicting rapidly evolving patterns in real-world applications where relevant training data are scarce. However, most of these models rely on the Transformer architecture, which incurs quadratic complexity as input length increases. To address this, we introduce TSMamba, a linear-complexity foundation model for time series forecasting built on the Mamba architecture. The model captures temporal dependencies through both forward and backward Mamba encoders, achieving high prediction accuracy. To reduce reliance on large datasets and lower training costs, TSMamba employs a two-stage transfer learning process that leverages pretrained Mamba LLMs, allowing effective time series modeling with a moderate training set. In the first stage, the forward and backward backbones are optimized via patch-wise autoregressive prediction; in the second stage, the model trains a prediction head and refines other components for long-term forecasting. While the backbone assumes channel independence to manage varying channel numbers across datasets, a channel-wise compressed attention module is introduced to capture cross-channel dependencies during fine-tuning on specific multivariate datasets. Experiments show that TSMamba's zero-shot performance is comparable to state-of-the-art time series foundation models, despite using significantly less training data. It also achieves competitive or superior full-shot performance compared to task-specific prediction models. The code will be made publicly available.

A Mamba Foundation Model for Time Series Forecasting

TL;DR

TSMamba is introduced, a linear-complexity foundation model for time series forecasting built on the Mamba architecture that achieves competitive or superior full-shot performance compared to task-specific prediction models.

Abstract

Time series foundation models have demonstrated strong performance in zero-shot learning, making them well-suited for predicting rapidly evolving patterns in real-world applications where relevant training data are scarce. However, most of these models rely on the Transformer architecture, which incurs quadratic complexity as input length increases. To address this, we introduce TSMamba, a linear-complexity foundation model for time series forecasting built on the Mamba architecture. The model captures temporal dependencies through both forward and backward Mamba encoders, achieving high prediction accuracy. To reduce reliance on large datasets and lower training costs, TSMamba employs a two-stage transfer learning process that leverages pretrained Mamba LLMs, allowing effective time series modeling with a moderate training set. In the first stage, the forward and backward backbones are optimized via patch-wise autoregressive prediction; in the second stage, the model trains a prediction head and refines other components for long-term forecasting. While the backbone assumes channel independence to manage varying channel numbers across datasets, a channel-wise compressed attention module is introduced to capture cross-channel dependencies during fine-tuning on specific multivariate datasets. Experiments show that TSMamba's zero-shot performance is comparable to state-of-the-art time series foundation models, despite using significantly less training data. It also achieves competitive or superior full-shot performance compared to task-specific prediction models. The code will be made publicly available.

Paper Structure

This paper contains 12 sections, 4 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Comparison of specialized time series models and foundation models: (a) Specialized models are trained separately for specific tasks using relevant datasets. These models lack the ability to generalize across different domains and frequencies. (b) Time series foundation models, trained on large datasets, generalize well across a wide range of scenarios and tasks.
  • Figure 2: TSMamba Architecture: The input time series are preprocessed and then fed into the forward and backward encoder to extract internal dependencies. The representations are combined and subsequently mapped to forecasts by the prediction head.
  • Figure 3: The first stage of transfer learning involves refining the backbone and training the input embedding through autoregressive forecasting or backcasting tasks. A small linear head is temporarily added to predict the next patch.
  • Figure 4: Compressed Channel-Wise Attention Module for Cross-Channel Dependency. The process starts with a per-channel temporal convolution to align the backbone outputs along the time dimension, followed by linear compression of the channel count. The attention module then extracts relationships between these compressed channels, and the result is linearly mapped to restore the original number of channels. Finally, the output is added back to the backbone as a correction.