Beyond Fact Retrieval: Episodic Memory for RAG with Generative Semantic Workspaces
Shreyas Rajesh, Pavan Holur, Chenda Duan, David Chong, Vwani Roychowdhury
TL;DR
The paper tackles the difficulty of long-context reasoning in large language models by introducing the Generative Semantic Workspace (GSW), a memory module that builds structured, time-grounded representations of evolving situations. GSW comprises an Operator that derives semantic maps from incoming text and a Reconciler that recursively updates a global workspace, enabling LLMs to track actors, roles, states, and spatiotemporal relations across events. On EpBench, GSW delivers state-of-the-art F1 scores (up to 0.850 on EpBench-200 and 0.773 on EpBench-2000) and markedly reduces input token requirements (≈51% fewer tokens), illustrating both improved accuracy and efficiency. The work demonstrates the feasibility of endowing LLMs with human-like episodic memory, offering a scalable blueprint for long-horizon reasoning in narrative and real-world domains. It also highlights next steps, including open-model validation, multimodal integration, and broader benchmarking to further validate memory-augmented reasoning in agents.
Abstract
Large Language Models (LLMs) face fundamental challenges in long-context reasoning: many documents exceed their finite context windows, while performance on texts that do fit degrades with sequence length, necessitating their augmentation with external memory frameworks. Current solutions, which have evolved from retrieval using semantic embeddings to more sophisticated structured knowledge graphs representations for improved sense-making and associativity, are tailored for fact-based retrieval and fail to build the space-time-anchored narrative representations required for tracking entities through episodic events. To bridge this gap, we propose the \textbf{Generative Semantic Workspace} (GSW), a neuro-inspired generative memory framework that builds structured, interpretable representations of evolving situations, enabling LLMs to reason over evolving roles, actions, and spatiotemporal contexts. Our framework comprises an \textit{Operator}, which maps incoming observations to intermediate semantic structures, and a \textit{Reconciler}, which integrates these into a persistent workspace that enforces temporal, spatial, and logical coherence. On the Episodic Memory Benchmark (EpBench) \cite{huet_episodic_2025} comprising corpora ranging from 100k to 1M tokens in length, GSW outperforms existing RAG based baselines by up to \textbf{20\%}. Furthermore, GSW is highly efficient, reducing query-time context tokens by \textbf{51\%} compared to the next most token-efficient baseline, reducing inference time costs considerably. More broadly, GSW offers a concrete blueprint for endowing LLMs with human-like episodic memory, paving the way for more capable agents that can reason over long horizons.
