ReMamba: Equip Mamba with Effective Long-Sequence Modeling
Danlong Yuan, Jiahao Liu, Bei Li, Huishuai Zhang, Jingang Wang, Xunliang Cai, Dongyan Zhao
TL;DR
ReMamba addresses the long-context degradation of Mamba, a linear-time state-space model, by introducing a two-stage re-forward mechanism that selectively compresses and adapts distant information. Stage1 compactly selects and replaces salient final-layer hidden states, Stage2 integrates these compressed representations into Mamba’s state updates with a trainable control to mitigate information loss. Empirical results on LongBench and LEval show substantial gains over the Mamba baseline and competitive performance with same-size transformers, with transfer gains observed on Mamba2 as well. The approach achieves these improvements with only a modest inference overhead, offering a practical path to enhancing long-context capabilities of memory-efficient models while outlining limitations and avenues for future refinement in state-space updates.
Abstract
While the Mamba architecture demonstrates superior inference efficiency and competitive performance on short-context natural language processing (NLP) tasks, empirical evidence suggests its capacity to comprehend long contexts is limited compared to transformer-based models. In this study, we investigate the long-context efficiency issues of the Mamba models and propose ReMamba, which enhances Mamba's ability to comprehend long contexts. ReMamba incorporates selective compression and adaptation techniques within a two-stage re-forward process, incurring minimal additional inference costs overhead. Experimental results on the LongBench and L-Eval benchmarks demonstrate ReMamba's efficacy, improving over the baselines by 3.2 and 1.6 points, respectively, and attaining performance almost on par with same-size transformer models.
