Table of Contents
Fetching ...

Bidirectional Linear Recurrent Models for Sequence-Level Multisource Fusion

Qisai Liu, Zhanhong Jiang, Joshua R. Waite, Chao Liu, Aditya Balu, Soumik Sarkar

TL;DR

The paper addresses long-range sequence modeling by bridging the efficiency of linear recurrence with bidirectional context. It introduces BLUR, a Bidirectional Linear Unit for Recurrent network, built from forward and backward LRUs, a merging layer, and a nonlinear projection to achieve parallelizable, linear-time updates while maintaining stability and universal approximation properties. The authors establish stability via eigenvalue constraints and prove a Barron-based universality result, showing that BLUR can approximate any causal sequence-to-sequence map as its width grows. Empirically, BLUR outperforms Transformers and prior RNN-based models on sequential images, text, and time-series forecasting tasks, with substantial reductions in computation and favorable scalability to long horizons. The work positions BLUR as a practical, efficient alternative for real-world forecasting tasks, while suggesting further exploration of error decay and broader domain deployment.

Abstract

Sequence modeling is a critical yet challenging task with wide-ranging applications, especially in time series forecasting for domains like weather prediction, temperature monitoring, and energy load forecasting. Transformers, with their attention mechanism, have emerged as state-of-the-art due to their efficient parallel training, but they suffer from quadratic time complexity, limiting their scalability for long sequences. In contrast, recurrent neural networks (RNNs) offer linear time complexity, spurring renewed interest in linear RNNs for more computationally efficient sequence modeling. In this work, we introduce BLUR (Bidirectional Linear Unit for Recurrent network), which uses forward and backward linear recurrent units (LRUs) to capture both past and future dependencies with high computational efficiency. BLUR maintains the linear time complexity of traditional RNNs, while enabling fast parallel training through LRUs. Furthermore, it offers provably stable training and strong approximation capabilities, making it highly effective for modeling long-term dependencies. Extensive experiments on sequential image and time series datasets reveal that BLUR not only surpasses transformers and traditional RNNs in accuracy but also significantly reduces computational costs, making it particularly suitable for real-world forecasting tasks. Our code is available here.

Bidirectional Linear Recurrent Models for Sequence-Level Multisource Fusion

TL;DR

The paper addresses long-range sequence modeling by bridging the efficiency of linear recurrence with bidirectional context. It introduces BLUR, a Bidirectional Linear Unit for Recurrent network, built from forward and backward LRUs, a merging layer, and a nonlinear projection to achieve parallelizable, linear-time updates while maintaining stability and universal approximation properties. The authors establish stability via eigenvalue constraints and prove a Barron-based universality result, showing that BLUR can approximate any causal sequence-to-sequence map as its width grows. Empirically, BLUR outperforms Transformers and prior RNN-based models on sequential images, text, and time-series forecasting tasks, with substantial reductions in computation and favorable scalability to long horizons. The work positions BLUR as a practical, efficient alternative for real-world forecasting tasks, while suggesting further exploration of error decay and broader domain deployment.

Abstract

Sequence modeling is a critical yet challenging task with wide-ranging applications, especially in time series forecasting for domains like weather prediction, temperature monitoring, and energy load forecasting. Transformers, with their attention mechanism, have emerged as state-of-the-art due to their efficient parallel training, but they suffer from quadratic time complexity, limiting their scalability for long sequences. In contrast, recurrent neural networks (RNNs) offer linear time complexity, spurring renewed interest in linear RNNs for more computationally efficient sequence modeling. In this work, we introduce BLUR (Bidirectional Linear Unit for Recurrent network), which uses forward and backward linear recurrent units (LRUs) to capture both past and future dependencies with high computational efficiency. BLUR maintains the linear time complexity of traditional RNNs, while enabling fast parallel training through LRUs. Furthermore, it offers provably stable training and strong approximation capabilities, making it highly effective for modeling long-term dependencies. Extensive experiments on sequential image and time series datasets reveal that BLUR not only surpasses transformers and traditional RNNs in accuracy but also significantly reduces computational costs, making it particularly suitable for real-world forecasting tasks. Our code is available here.

Paper Structure

This paper contains 17 sections, 5 theorems, 23 equations, 15 figures, 13 tables.

Key Result

Theorem 4.1

Suppose that the inputs $\bm{v}=(\bm{v}_i)_{i=1}^N\in\mathcal{V}\subseteq\mathbb{R}^{d\times N}$ are bounded, i.e., $\|\bm{v}_i\|<\infty$ for all $i\in[N]$. Denote by $\textnormal{dim}(\mathcal{V})$ the vector-space dimension of $\mathcal{V}$. Let $\mathcal{L}_1$ and $\mathcal{L}_2$ be LRU with eith

Figures (15)

  • Figure 1: BLUR vs. baselines with the number of parameters against the MSE on ETTm$_1$ dataset with horizon=96.
  • Figure 2: Bidirectional Linear Unit for Recurrent (BLUR) network: the model is a stack of Bidirectional LRU blocks, with two linear encoders, a linear merging layer to merge the bidirectional LRUs, and a nonlinear projection in between, and leverages skip connections and layer (or batch) normalization. The linear encoder is to encode raw inputs to an embedding space. Note also for convenience, we assume that all linear and nonlinear projections in the architecture do not change the hidden dimensions, unlike the dimensions defined in the function composition in Eq. \ref{['blur_function']}. Please refer to Figure \ref{['fig:blur_architecture']} for more detail.
  • Figure 3: Overview of the BLUR architecture. The model processes input sequences $\{X\}$, which can be sequential images, text, or sensor time series, using forward and backward Linear Recurrent Units (LRUs). Hidden states are updated through diagonal recurrent matrices $\Lambda_1$ and $\Lambda_2$, followed by fusion via a central combination module $C$ and a non-linear transformation. This enables bidirectional context modeling with linear time complexity.
  • Figure 4: A snippet of comparison between ground truth and predictions of target variable by LRU/BLUR on Weather data with a horizon of 24.
  • Figure 5: Inference time costs and test MAE along the sequence length with ETTh$_1$.
  • ...and 10 more figures

Theorems & Definitions (10)

  • Definition 3.1
  • Definition 3.2
  • Definition 3.3
  • Theorem 4.1
  • Lemma A.1
  • Theorem A.2
  • Theorem A.3
  • Lemma A.4
  • proof
  • proof