Fixed-Point RNNs: Interpolating from Diagonal to Dense
Sajad Movahedi, Felix Sarnthein, Nicola Muca Cirone, Antonio Orvieto
TL;DR
This work introduces Fixed-Point RNNs (FP-RNNs) to bridge diagonal and dense linear RNNs by formulating dense transitions as fixed points of diagonal recurrences. The core idea is to parameterize a diagonal RNN with a contraction via Lambda_t and a channel mixer Q_t, then solve for the fixed point h^* (and, for matrix states, H_t^*) through fixed-point iterations, effectively trading parallelism for expressivity as needed. The FP-RNN framework is instantiated in FP-Mamba, which extends to matrix states with structured, input-dependent mixers (Diagonal+Low Rank, Householder, Kronecker) and a shifted hidden-state dependence to enhance copying and state tracking. Training leverages implicit differentiation to avoid storing the full fixed-point iterations, and theoretical results guarantee stable convergence under contractive conditions. Empirically, FP-Mamba achieves strong state-tracking on A5 and S5 and competitive copying performance, with adaptive iteration counts enabling longer sequence generalization while maintaining fixed parameter budgets. This framework offers a scalable path to highly expressive sequence mixers that can interpolate between fast diagonal processing and dense, memory-rich dynamics suitable for long-range dependencies.
Abstract
Linear recurrent neural networks (RNNs) and state-space models (SSMs) such as Mamba have become promising alternatives to softmax-attention as sequence mixing layers in Transformer architectures. Current models, however, do not exhibit the full state-tracking expressivity of RNNs because they rely on channel-wise (i.e. diagonal) sequence mixing. In this paper, we investigate parameterizations of a large class of dense linear RNNs as fixed-points of parallelizable diagonal linear RNNs. The resulting models can naturally trade expressivity for efficiency at a fixed number of parameters and achieve state-of-the-art results on the state-tracking benchmarks $A_5$ and $S_5$, while matching performance on copying and other tasks.
