Table of Contents
Fetching ...

Beyond Fact Retrieval: Episodic Memory for RAG with Generative Semantic Workspaces

Shreyas Rajesh, Pavan Holur, Chenda Duan, David Chong, Vwani Roychowdhury

TL;DR

The paper tackles the difficulty of long-context reasoning in large language models by introducing the Generative Semantic Workspace (GSW), a memory module that builds structured, time-grounded representations of evolving situations. GSW comprises an Operator that derives semantic maps from incoming text and a Reconciler that recursively updates a global workspace, enabling LLMs to track actors, roles, states, and spatiotemporal relations across events. On EpBench, GSW delivers state-of-the-art F1 scores (up to 0.850 on EpBench-200 and 0.773 on EpBench-2000) and markedly reduces input token requirements (≈51% fewer tokens), illustrating both improved accuracy and efficiency. The work demonstrates the feasibility of endowing LLMs with human-like episodic memory, offering a scalable blueprint for long-horizon reasoning in narrative and real-world domains. It also highlights next steps, including open-model validation, multimodal integration, and broader benchmarking to further validate memory-augmented reasoning in agents.

Abstract

Large Language Models (LLMs) face fundamental challenges in long-context reasoning: many documents exceed their finite context windows, while performance on texts that do fit degrades with sequence length, necessitating their augmentation with external memory frameworks. Current solutions, which have evolved from retrieval using semantic embeddings to more sophisticated structured knowledge graphs representations for improved sense-making and associativity, are tailored for fact-based retrieval and fail to build the space-time-anchored narrative representations required for tracking entities through episodic events. To bridge this gap, we propose the \textbf{Generative Semantic Workspace} (GSW), a neuro-inspired generative memory framework that builds structured, interpretable representations of evolving situations, enabling LLMs to reason over evolving roles, actions, and spatiotemporal contexts. Our framework comprises an \textit{Operator}, which maps incoming observations to intermediate semantic structures, and a \textit{Reconciler}, which integrates these into a persistent workspace that enforces temporal, spatial, and logical coherence. On the Episodic Memory Benchmark (EpBench) \cite{huet_episodic_2025} comprising corpora ranging from 100k to 1M tokens in length, GSW outperforms existing RAG based baselines by up to \textbf{20\%}. Furthermore, GSW is highly efficient, reducing query-time context tokens by \textbf{51\%} compared to the next most token-efficient baseline, reducing inference time costs considerably. More broadly, GSW offers a concrete blueprint for endowing LLMs with human-like episodic memory, paving the way for more capable agents that can reason over long horizons.

Beyond Fact Retrieval: Episodic Memory for RAG with Generative Semantic Workspaces

TL;DR

The paper tackles the difficulty of long-context reasoning in large language models by introducing the Generative Semantic Workspace (GSW), a memory module that builds structured, time-grounded representations of evolving situations. GSW comprises an Operator that derives semantic maps from incoming text and a Reconciler that recursively updates a global workspace, enabling LLMs to track actors, roles, states, and spatiotemporal relations across events. On EpBench, GSW delivers state-of-the-art F1 scores (up to 0.850 on EpBench-200 and 0.773 on EpBench-2000) and markedly reduces input token requirements (≈51% fewer tokens), illustrating both improved accuracy and efficiency. The work demonstrates the feasibility of endowing LLMs with human-like episodic memory, offering a scalable blueprint for long-horizon reasoning in narrative and real-world domains. It also highlights next steps, including open-model validation, multimodal integration, and broader benchmarking to further validate memory-augmented reasoning in agents.

Abstract

Large Language Models (LLMs) face fundamental challenges in long-context reasoning: many documents exceed their finite context windows, while performance on texts that do fit degrades with sequence length, necessitating their augmentation with external memory frameworks. Current solutions, which have evolved from retrieval using semantic embeddings to more sophisticated structured knowledge graphs representations for improved sense-making and associativity, are tailored for fact-based retrieval and fail to build the space-time-anchored narrative representations required for tracking entities through episodic events. To bridge this gap, we propose the \textbf{Generative Semantic Workspace} (GSW), a neuro-inspired generative memory framework that builds structured, interpretable representations of evolving situations, enabling LLMs to reason over evolving roles, actions, and spatiotemporal contexts. Our framework comprises an \textit{Operator}, which maps incoming observations to intermediate semantic structures, and a \textit{Reconciler}, which integrates these into a persistent workspace that enforces temporal, spatial, and logical coherence. On the Episodic Memory Benchmark (EpBench) \cite{huet_episodic_2025} comprising corpora ranging from 100k to 1M tokens in length, GSW outperforms existing RAG based baselines by up to \textbf{20\%}. Furthermore, GSW is highly efficient, reducing query-time context tokens by \textbf{51\%} compared to the next most token-efficient baseline, reducing inference time costs considerably. More broadly, GSW offers a concrete blueprint for endowing LLMs with human-like episodic memory, paving the way for more capable agents that can reason over long horizons.

Paper Structure

This paper contains 37 sections, 8 equations, 10 figures, 15 tables.

Figures (10)

  • Figure 1: Unifying Brain-Inspired and Generative Semantics for Episodic Memory Modeling The hippocampal complex (DG, CA3, CA1) and neocortical regions (NC) inspire the Reconciler (retrieval, workspace, update) and Operator (LLM-driven semantic role extraction), respectively. The neocortical complex, responsible for context-rich consolidation and predictive modeling, aligns with the Operator module's functions. The hippocampal complex, which performs indexing, pattern separation, and sequence modeling, corresponds to the Reconciler. Together, the GSW framework offers a biologically inspired, interpretable model for simulating world knowledge from text inputs.
  • Figure 2: Episodic Memory Creation and QA: Figure illustrates the end-to-end process of constructing a workspace and question answering from the workspace. (top) Large-scale text is segmented into semantically coherent chunks. Each chunk is processed by the Operator model to generate a local workspace instance, represented as a semantic graph. These instances are incrementally integrated by the Reconciler resulting in a unified Global Memory. (bottom) During question answering, the system retrieves relevant portions of this memory by matching named entities in the query to identifiers in the semantic network. For each match, it reconstructs episodic summaries—contextual recreations of past situations—which are re-ranked and passed to an LLM to generate the final answer.
  • Figure 3: LLM prompt for Operator extraction.
  • Figure 4: LLM prompt for Space Time coupling.
  • Figure 5: LLM prompt for QA reconciliation.
  • ...and 5 more figures