Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents
Yaorui Shi, Yuxin Chen, Siyuan Wang, Sihang Li, Hengxing Cai, Qi Gu, Xiang Wang, An Zhang
TL;DR
This work tackles long-context question answering where critical evidence is dispersed across vast corpora. It introduces ReMemR1, a memory-augmented agent that uses a history-augmented state with callback queries to retrieve from the full memory history, enabling non-linear reasoning and revisiting early evidence. To train such a system, the authors propose RLMLR, a reinforcement learning framework that combines trajectory-level final-answer rewards with dense, step-level rewards to guide memory usage and retrieval. Empirical results on HotpotQA and 2WikiMultiHopQA demonstrate significant gains over baselines, including strong generalization to out-of-distribution data and robustness under distant-evidence settings, with ablations validating the effectiveness of both RLMLR and the RL-driven memory callback.
Abstract
Large language models face challenges in long-context question answering, where key evidence of a query may be dispersed across millions of tokens. Existing works equip large language models with a memory corpus that is dynamically updated during a single-pass document scan, also known as the "memorize while reading" methods. While this approach scales efficiently, it suffers from irreversible forward-only processing, information loss through overwriting, and sparse reinforcement learning signals. To tackle these challenges, we present ReMemR1, a memory-augmented agent with callback-enhanced memory that allows selective retrieval from the entire memory history and allows non-linear reasoning and revisiting of early evidence. To further strengthen training, we propose Reinforcement Learning with Multi-Level Rewards (RLMLR), which combines final-answer rewards with dense, step-level signals that guide effective memory use. Together, these contributions mitigate information degradation, improve supervision, and support multi-hop memory utilizing. Experiments on long-document QA show significant gains over existing memory-based approaches, which validates ReMemR1 as an effective solution for long-context reasoning agents.
