ms-Mamba: Multi-scale Mamba for Time-Series Forecasting
Yusuf Meric Karadag, Sinan Kalkan, Ipek Gursel Dino
TL;DR
ms-Mamba addresses multi-scale time-series forecasting by introducing parallel Mamba blocks with different sampling rates to capture signals across temporal scales. It combines an embedding layer with a Multi-scale Mamba Layer that aggregates outputs across scales using fixed, learnable, or dynamic sampling rates, and trains end-to-end with a standard MSE objective. Across 13 real-world datasets, ms-Mamba achieves state-of-the-art or competitive results relative to Transformer-based and Mamba-based baselines, often with fewer parameters and lower memory/MACs, particularly on datasets with pronounced multi-scale structure. The approach demonstrates the practical value of explicitly modeling multiple temporal scales in TSF and suggests broad applicability to other modalities and hybrid architectures.
Abstract
The problem of Time-series Forecasting is generally addressed by recurrent, Transformer-based and the recently proposed Mamba-based architectures. However, existing architectures generally process their input at a single temporal scale, which may be sub-optimal for many tasks where information changes over multiple time scales. In this paper, we introduce a novel architecture called Multi-scale Mamba (ms-Mamba) to address this gap. ms-Mamba incorporates multiple temporal scales by using multiple Mamba blocks with different sampling rates ($Δ$s). Our experiments on many benchmarks demonstrate that ms-Mamba outperforms state-of-the-art approaches, including the recently proposed Transformer-based and Mamba-based models.
