BlackGoose Rimer: Harnessing RWKV-7 as a Simple yet Superior Replacement for Transformers in Large-Scale Time Series Modeling
Li weile, Liu Xiao
TL;DR
The paper addresses the challenge of scaling large-time-series models by replacing Timer's Transformer backbone with RWKV-7, which incorporates time mix and channel mix and is implemented in an implicit DEQ framework to enable effectively infinite-depth recurrence. The authors demonstrate that a 1.6M-parameter Rimer model can match or exceed the performance of a 37.8M-parameter Timer model, achieving up to ≈4.5× training-time speedups across multiple datasets and showing strong cross-hardware compatibility via Triton on ROCm platforms. Key contributions include revisiting RWKV-7 for time series, integrating it into a Transformer-based architecture, and presenting DEQ-based implicit layers for efficiency, with public code and weights. The work highlights RWKV-7 as a practical, scalable alternative for large-scale time-series modeling with significant gains in efficiency and robustness for forecasting tasks.
Abstract
Time series models face significant challenges in scaling to handle large and complex datasets, akin to the scaling achieved by large language models (LLMs). The unique characteristics of time series data and the computational demands of model scaling necessitate innovative approaches. While researchers have explored various architectures such as Transformers, LSTMs, and GRUs to address these challenges, we propose a novel solution using RWKV-7, which incorporates meta-learning into its state update mechanism. By integrating RWKV-7's time mix and channel mix components into the transformer-based time series model Timer, we achieve a substantial performance improvement of approximately 1.13 to 43.3x and a 4.5x reduction in training time with 1/23 parameters, all while utilizing fewer parameters. Our code and model weights are publicly available for further research and development at https://github.com/Alic-Li/BlackGoose_Rimer.
