Bidirectional Long-Range Parser for Sequential Data Understanding
George Leotescu, Daniel Voinea, Alin-Ionut Popa
TL;DR
The paper tackles the difficulty of applying Transformers to long sequences due to quadratic attention complexity. It introduces Bidirectional Long-Range Parser (BLRP), which combines local-window attention with a global bidirectional latent-space synthesis by splitting the input $ extbf{X} obreak o extbf{X}_i obreak( ext{size }t)$ and maintaining a latent block $ extbf{L} obreak o extbf{R}^{l imes d}$ initialized via a learned projection $ extPhi( extbf{X})$. A forward and a backward pass update the latent block through interleaved self-attention and cross-attention, producing a final representation $ extbf{L}^{ ext{FINAL}}$ that encodes both proximal and distant dependencies for downstream tasks. BLRP achieves competitive or superior results on the Long-Range-Arena benchmarks (ListOps, Text, Retrieval) and CIFAR with around $2.58 imes 10^5$ parameters, while ablation studies highlight the critical roles of bidirectional flow and dynamic-projection initializations. The method offers a scalable, versatile approach to long-sequence understanding across language and vision and provides a flexible framework for incorporating global context into sequential data processing.
Abstract
The transformer is a powerful data modelling framework responsible for remarkable performance on a wide range of tasks. However, they are limited in terms of scalability as it is suboptimal and inefficient to process long-sequence data. To this purpose we introduce BLRP (Bidirectional Long-Range Parser), a novel and versatile attention mechanism designed to increase performance and efficiency on long-sequence tasks. It leverages short and long range heuristics in the form of a local sliding window approach combined with a global bidirectional latent space synthesis technique. We show the benefits and versatility of our approach on vision and language domains by demonstrating competitive results against state-of-the-art methods on the Long-Range-Arena and CIFAR benchmarks together with ablations demonstrating the computational efficiency.
