Table of Contents
Fetching ...

EcphoryRAG: Re-Imagining Knowledge-Graph RAG via Human Associative Memory

Zirui Liao

TL;DR

EcphoryRAG introduces a cue-driven, memory-inspired RAG framework that casts retrieval as an ecphoric process over an entity-centric memory graph. It offline-builds Engrams and an associative knowledge graph with multi-granularity indices, then online performs cue extraction, multi-hop associative search guided by weighted embeddings, and final re-ranking anchored to the original query to produce grounded, multi-step reasoning. Across 2WikiMultiHopQA, HotpotQA, and MuSiQue, EcphoryRAG achieves state-of-the-art EM and F1 scores while substantially reducing offline indexing costs, demonstrating both high reasoning quality and practical efficiency. By modeling human memory principles, the approach offers a scalable, adaptable pathway for structured RAG that can enable continual learning and goal-oriented retrieval in AI systems.

Abstract

Cognitive neuroscience research indicates that humans leverage cues to activate entity-centered memory traces (engrams) for complex, multi-hop recollection. Inspired by this mechanism, we introduce EcphoryRAG, an entity-centric knowledge graph RAG framework. During indexing, EcphoryRAG extracts and stores only core entities with corresponding metadata, a lightweight approach that reduces token consumption by up to 94\% compared to other structured RAG systems. For retrieval, the system first extracts cue entities from queries, then performs a scalable multi-hop associative search across the knowledge graph. Crucially, EcphoryRAG dynamically infers implicit relations between entities to populate context, enabling deep reasoning without exhaustive pre-enumeration of relationships. Extensive evaluations on the 2WikiMultiHop, HotpotQA, and MuSiQue benchmarks demonstrate that EcphoryRAG sets a new state-of-the-art, improving the average Exact Match (EM) score from 0.392 to 0.474 over strong KG-RAG methods like HippoRAG. These results validate the efficacy of the entity-cue-multi-hop retrieval paradigm for complex question answering.

EcphoryRAG: Re-Imagining Knowledge-Graph RAG via Human Associative Memory

TL;DR

EcphoryRAG introduces a cue-driven, memory-inspired RAG framework that casts retrieval as an ecphoric process over an entity-centric memory graph. It offline-builds Engrams and an associative knowledge graph with multi-granularity indices, then online performs cue extraction, multi-hop associative search guided by weighted embeddings, and final re-ranking anchored to the original query to produce grounded, multi-step reasoning. Across 2WikiMultiHopQA, HotpotQA, and MuSiQue, EcphoryRAG achieves state-of-the-art EM and F1 scores while substantially reducing offline indexing costs, demonstrating both high reasoning quality and practical efficiency. By modeling human memory principles, the approach offers a scalable, adaptable pathway for structured RAG that can enable continual learning and goal-oriented retrieval in AI systems.

Abstract

Cognitive neuroscience research indicates that humans leverage cues to activate entity-centered memory traces (engrams) for complex, multi-hop recollection. Inspired by this mechanism, we introduce EcphoryRAG, an entity-centric knowledge graph RAG framework. During indexing, EcphoryRAG extracts and stores only core entities with corresponding metadata, a lightweight approach that reduces token consumption by up to 94\% compared to other structured RAG systems. For retrieval, the system first extracts cue entities from queries, then performs a scalable multi-hop associative search across the knowledge graph. Crucially, EcphoryRAG dynamically infers implicit relations between entities to populate context, enabling deep reasoning without exhaustive pre-enumeration of relationships. Extensive evaluations on the 2WikiMultiHop, HotpotQA, and MuSiQue benchmarks demonstrate that EcphoryRAG sets a new state-of-the-art, improving the average Exact Match (EM) score from 0.392 to 0.474 over strong KG-RAG methods like HippoRAG. These results validate the efficacy of the entity-cue-multi-hop retrieval paradigm for complex question answering.

Paper Structure

This paper contains 31 sections, 2 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: The cognitive principle of cued recall that inspires our work. In a classic memory experiment, a general prompt (Free Recall) often leads to incomplete retrieval. However, specific Cues (e.g., "domestic" or "foreign") activate targeted memory traces, enabling complete and accurate recall. EcphoryRAG is designed to operationalize this powerful principle for complex question answering.
  • Figure 2: A comparison of retrieval paradigms. (Top) Naive RAG: A question directly retrieves information from a monolithic vector base using a single semantic search step. This process is simple but often fails to connect disparate facts. (Bottom) Ecphory-inspired RAG: Our proposed workflow, which mimics human memory. A question is first processed to Extract specific Cues. These cues then trigger a targeted Recall process (Ecphory) from a structured knowledge base of Engrams. The retrieved, relevant information is then synthesized by an LLM (the "brain") to produce the final, reasoned answer.
  • Figure 3: The detailed end-to-end workflow of EcphoryRAG, separated into an offline Index Phase and an online QA Phase. Index Phase (top): Raw documents are processed by an LLM to extract structured Engrams (e.g., the entity '1844' with its type, description, importance score and so on). These engrams are stored in a multi-granular memory system, comprising a Knowledge Graph Base for relational structure and a Text Chunk Base for source evidence. QA Phase (bottom): The process begins with a user's Question. (1) Cue Extraction: An LLM identifies key entities to serve as retrieval Cues. (2) Associative Search: These cues initiate a multi-hop search across the knowledge graph, activating related engrams. (3) Context Grounding: The system retrieves the original text chunks associated with the activated engrams. (4) Generation: The retrieved engrams and their grounded text are combined into a rich context, which the LLM uses to perform step-by-step reasoning and generate the final answer.
  • Figure 4: Ablation on context components on the 2Wiki dataset. The full "Entity+Chunk" method (orange) vastly outperforms the "Entity-Only" approach (blue), though it is more sensitive to noise at larger values of $k$.
  • Figure 5: Performance of EcphoryRAG with varying numbers of retrieved passages ($k$). The optimal value of $k$ is dataset-dependent, reflecting differences in evidence dispersion and noise sensitivity.