Table of Contents
Fetching ...

ms-Mamba: Multi-scale Mamba for Time-Series Forecasting

Yusuf Meric Karadag, Sinan Kalkan, Ipek Gursel Dino

TL;DR

ms-Mamba addresses multi-scale time-series forecasting by introducing parallel Mamba blocks with different sampling rates to capture signals across temporal scales. It combines an embedding layer with a Multi-scale Mamba Layer that aggregates outputs across scales using fixed, learnable, or dynamic sampling rates, and trains end-to-end with a standard MSE objective. Across 13 real-world datasets, ms-Mamba achieves state-of-the-art or competitive results relative to Transformer-based and Mamba-based baselines, often with fewer parameters and lower memory/MACs, particularly on datasets with pronounced multi-scale structure. The approach demonstrates the practical value of explicitly modeling multiple temporal scales in TSF and suggests broad applicability to other modalities and hybrid architectures.

Abstract

The problem of Time-series Forecasting is generally addressed by recurrent, Transformer-based and the recently proposed Mamba-based architectures. However, existing architectures generally process their input at a single temporal scale, which may be sub-optimal for many tasks where information changes over multiple time scales. In this paper, we introduce a novel architecture called Multi-scale Mamba (ms-Mamba) to address this gap. ms-Mamba incorporates multiple temporal scales by using multiple Mamba blocks with different sampling rates ($Δ$s). Our experiments on many benchmarks demonstrate that ms-Mamba outperforms state-of-the-art approaches, including the recently proposed Transformer-based and Mamba-based models.

ms-Mamba: Multi-scale Mamba for Time-Series Forecasting

TL;DR

ms-Mamba addresses multi-scale time-series forecasting by introducing parallel Mamba blocks with different sampling rates to capture signals across temporal scales. It combines an embedding layer with a Multi-scale Mamba Layer that aggregates outputs across scales using fixed, learnable, or dynamic sampling rates, and trains end-to-end with a standard MSE objective. Across 13 real-world datasets, ms-Mamba achieves state-of-the-art or competitive results relative to Transformer-based and Mamba-based baselines, often with fewer parameters and lower memory/MACs, particularly on datasets with pronounced multi-scale structure. The approach demonstrates the practical value of explicitly modeling multiple temporal scales in TSF and suggests broad applicability to other modalities and hybrid architectures.

Abstract

The problem of Time-series Forecasting is generally addressed by recurrent, Transformer-based and the recently proposed Mamba-based architectures. However, existing architectures generally process their input at a single temporal scale, which may be sub-optimal for many tasks where information changes over multiple time scales. In this paper, we introduce a novel architecture called Multi-scale Mamba (ms-Mamba) to address this gap. ms-Mamba incorporates multiple temporal scales by using multiple Mamba blocks with different sampling rates (s). Our experiments on many benchmarks demonstrate that ms-Mamba outperforms state-of-the-art approaches, including the recently proposed Transformer-based and Mamba-based models.

Paper Structure

This paper contains 21 sections, 8 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: (a) Mamba and its variations (S-Mamba) use a single time-scale while processing time-series data. (b) Our ms-Mamba processes its input at different time-scales to better capture signal at different scales.
  • Figure 2: An overview of the proposed method. ms-Mamba processes the time-series data at different sampling rates to better capture the multi-scale nature of the input signal. This is achieved by processing and updating the embeddings with different sampling rates (SR).