Bidirectional Linear Recurrent Models for Sequence-Level Multisource Fusion
Qisai Liu, Zhanhong Jiang, Joshua R. Waite, Chao Liu, Aditya Balu, Soumik Sarkar
TL;DR
The paper addresses long-range sequence modeling by bridging the efficiency of linear recurrence with bidirectional context. It introduces BLUR, a Bidirectional Linear Unit for Recurrent network, built from forward and backward LRUs, a merging layer, and a nonlinear projection to achieve parallelizable, linear-time updates while maintaining stability and universal approximation properties. The authors establish stability via eigenvalue constraints and prove a Barron-based universality result, showing that BLUR can approximate any causal sequence-to-sequence map as its width grows. Empirically, BLUR outperforms Transformers and prior RNN-based models on sequential images, text, and time-series forecasting tasks, with substantial reductions in computation and favorable scalability to long horizons. The work positions BLUR as a practical, efficient alternative for real-world forecasting tasks, while suggesting further exploration of error decay and broader domain deployment.
Abstract
Sequence modeling is a critical yet challenging task with wide-ranging applications, especially in time series forecasting for domains like weather prediction, temperature monitoring, and energy load forecasting. Transformers, with their attention mechanism, have emerged as state-of-the-art due to their efficient parallel training, but they suffer from quadratic time complexity, limiting their scalability for long sequences. In contrast, recurrent neural networks (RNNs) offer linear time complexity, spurring renewed interest in linear RNNs for more computationally efficient sequence modeling. In this work, we introduce BLUR (Bidirectional Linear Unit for Recurrent network), which uses forward and backward linear recurrent units (LRUs) to capture both past and future dependencies with high computational efficiency. BLUR maintains the linear time complexity of traditional RNNs, while enabling fast parallel training through LRUs. Furthermore, it offers provably stable training and strong approximation capabilities, making it highly effective for modeling long-term dependencies. Extensive experiments on sequential image and time series datasets reveal that BLUR not only surpasses transformers and traditional RNNs in accuracy but also significantly reduces computational costs, making it particularly suitable for real-world forecasting tasks. Our code is available here.
