Rhea: Role-aware Heuristic Episodic Attention for Conversational LLMs
Wanyang Hong, Zhaoning Zhang, Yi Chen, Libo Zhang, Baihui Liu, Linbo Qiao, Zhiliang Tian, Dongsheng Li
TL;DR
Multi-turn LLMs suffer from cumulative contextual decay due to attention pollution, dilution, and drift. Rhea introduces a role-aware memory architecture with an Instructional Memory for persistent global constraints and an Episodic Memory for dynamic interactions, coupled with a heuristic context retrieval mechanism and embedding-level reconstruction to maintain high signal-to-noise context. Empirical results across MT-Bench, MT-Eval, and Long-MT-Bench+ show substantial improvements in long-horizon accuracy and instruction fidelity, including a 16% relative gain and IAR > 8.1, with only modest latency overhead. Ablation studies demonstrate the necessity of both memory streams and the retrieval strategy, underscoring a shift from expanding context windows to improving the quality and structure of context for robust conversational LLMs.
Abstract
Large Language Models (LLMs) have achieved remarkable performance on single-turn tasks, yet their effectiveness deteriorates in multi-turn conversations. We define this phenomenon as cumulative contextual decay - a progressive degradation of contextual integrity caused by attention pollution, dilution, and drift. To address this challenge, we propose Rhea (Role-aware Heuristic Episodic Attention), a novel framework that decouples conversation history into two functionally independent memory modules: (1) an Instructional Memory (IM) that persistently stores high-fidelity global constraints via a structural priority mechanism, and (2) an Episodic Memory (EM) that dynamically manages user-model interactions via asymmetric noise control and heuristic context retrieval. During inference, Rhea constructs a high signal-to-noise context by applying its priority attention: selectively integrating relevant episodic information while always prioritizing global instructions. To validate this approach, experiments on multiple multi-turn conversation benchmarks - including MT-Eval and Long-MT-Bench+ - show that Rhea mitigates performance decay and improves overall accuracy by 1.04 points on a 10-point scale (a 16% relative gain over strong baselines). Moreover, Rhea maintains near-perfect instruction fidelity (IAR > 8.1) across long-horizon interactions. These results demonstrate that Rhea provides a principled and effective framework for building more precise, instruction-consistent conversational LLMs.
