xLSTM-Mixer: Multivariate Time Series Forecasting by Mixing via Scalar Memories
Maurice Kraus, Felix Divo, Devendra Singh Dhami, Kristian Kersting
TL;DR
xLSTM-Mixer presents a three-stage, memory-efficient approach for multivariate time-series forecasting by first producing a channel-independent linear forecast, refining it with a stacked sLSTM that performs time-variate mixing, and finally reconciling two views (original and reversed embeddings) through a learned projection. The method achieves state-of-the-art long-horizon performance across diverse datasets while using significantly less memory than Transformer-based rivals. It also demonstrates versatility by delivering competitive probabilistic forecasts on GIFT-Eval and strong results as a time-series embedding for classification. Comprehensive ablations and analyses confirm that the combination of time mixing, memory-based cross-variate mixing, and multi-view reconciliation are the core drivers of its robustness and accuracy.
Abstract
Time series data is prevalent across numerous fields, necessitating the development of robust and accurate forecasting models. Capturing patterns both within and between temporal and multivariate components is crucial for reliable predictions. We introduce xLSTM-Mixer, a model designed to effectively integrate temporal sequences, joint time-variate information, and multiple perspectives for robust forecasting. Our approach begins with a linear forecast shared across variates, which is then refined by xLSTM blocks. They serve as key elements for modeling the complex dynamics of challenging time series data. xLSTM-Mixer ultimately reconciles two distinct views to produce the final forecast. Our extensive evaluations demonstrate its superior long-term forecasting performance compared to recent state-of-the-art methods while requiring very little memory. A thorough model analysis provides further insights into its key components and confirms its robustness and effectiveness. This work contributes to the resurgence of recurrent models in forecasting by combining them, for the first time, with mixing architectures.
