Table of Contents
Fetching ...

Adaptive Normalization Mamba with Multi Scale Trend Decomposition and Patch MoE Encoding

MinCheol Jeon

TL;DR

AdaMamba tackles non-stationarity and distribution drift in time-series forecasting by unifying adaptive normalization, multi-scale trend extraction, and selective state-space modeling within a single framework. The architecture combines an Adaptive Normalization Block, patch-based embeddings, a Split-Mamba Contextual Encoder, and a Mixture-of-Experts FFN, followed by trend-consistent de-normalization. Empirical results on ETTh/ETTm and Weather benchmarks show consistent gains over Transformer baselines and competitive performance against recent SSM- and patch-based methods, highlighting improved stability and long-horizon accuracy. The work suggests strong potential for extending AdaMamba to probabilistic forecasting to better quantify uncertainty in non-stationary environments.

Abstract

Time series forecasting in real world environments faces significant challenges non stationarity, multi scale temporal patterns, and distributional shifts that degrade model stability and accuracy. This study propose AdaMamba, a unified forecasting architecture that integrates adaptive normalization, multi scale trend extraction, and contextual sequence modeling to address these challenges. AdaMamba begins with an Adaptive Normalization Block that removes non stationary components through multi scale convolutional trend extraction and channel wise recalibration, enabling consistent detrending and variance stabilization. The normalized sequence is then processed by a Context Encoder that combines patch wise embeddings, positional encoding, and a Mamba enhanced Transformer layer with a mixture of experts feed forward module, allowing efficient modeling of both long range dependencies and local temporal dynamics. A lightweight prediction head generates multi horizon forecasts, and a denormalization mechanism reconstructs outputs by reintegrating local trends to ensure robustness under varying temporal conditions. AdaMamba provides strong representational capacity with modular extensibility, supporting deterministic prediction and compatibility with probabilistic extensions. Its design effectively mitigates covariate shift and enhances predictive reliability across heterogeneous datasets. Experimental evaluations demonstrate that AdaMamba's combination of adaptive normalization and expert augmented contextual modeling yields consistent improvements in stability and accuracy over conventional Transformer based baselines.

Adaptive Normalization Mamba with Multi Scale Trend Decomposition and Patch MoE Encoding

TL;DR

AdaMamba tackles non-stationarity and distribution drift in time-series forecasting by unifying adaptive normalization, multi-scale trend extraction, and selective state-space modeling within a single framework. The architecture combines an Adaptive Normalization Block, patch-based embeddings, a Split-Mamba Contextual Encoder, and a Mixture-of-Experts FFN, followed by trend-consistent de-normalization. Empirical results on ETTh/ETTm and Weather benchmarks show consistent gains over Transformer baselines and competitive performance against recent SSM- and patch-based methods, highlighting improved stability and long-horizon accuracy. The work suggests strong potential for extending AdaMamba to probabilistic forecasting to better quantify uncertainty in non-stationary environments.

Abstract

Time series forecasting in real world environments faces significant challenges non stationarity, multi scale temporal patterns, and distributional shifts that degrade model stability and accuracy. This study propose AdaMamba, a unified forecasting architecture that integrates adaptive normalization, multi scale trend extraction, and contextual sequence modeling to address these challenges. AdaMamba begins with an Adaptive Normalization Block that removes non stationary components through multi scale convolutional trend extraction and channel wise recalibration, enabling consistent detrending and variance stabilization. The normalized sequence is then processed by a Context Encoder that combines patch wise embeddings, positional encoding, and a Mamba enhanced Transformer layer with a mixture of experts feed forward module, allowing efficient modeling of both long range dependencies and local temporal dynamics. A lightweight prediction head generates multi horizon forecasts, and a denormalization mechanism reconstructs outputs by reintegrating local trends to ensure robustness under varying temporal conditions. AdaMamba provides strong representational capacity with modular extensibility, supporting deterministic prediction and compatibility with probabilistic extensions. Its design effectively mitigates covariate shift and enhances predictive reliability across heterogeneous datasets. Experimental evaluations demonstrate that AdaMamba's combination of adaptive normalization and expert augmented contextual modeling yields consistent improvements in stability and accuracy over conventional Transformer based baselines.

Paper Structure

This paper contains 23 sections, 19 equations, 2 figures, 1 table, 2 algorithms.

Figures (2)

  • Figure 1: Overall architecture of AdaMamba.
  • Figure 2: Holistic performance comparison using log-normalized radar charts across five benchmark datasets. The charts compare five error metrics: MAE, MSE, RMSE, MAPE, and RSE. The center of the chart represents optimal performance (minimum error), while the outer periphery indicates maximum error magnitude among the compared models. AdaMamba (brown line with markers) consistently forms the most tightly concentrated polygon around the center, indicating superior overall performance across multiple metrics compared to baselines like Informer (green line), which shows significantly larger error areas.