ENGRAM: Effective, Lightweight Memory Orchestration for Conversational Agents
Daivik Patel, Shrenik Patel
TL;DR
This work tackles the challenge of maintaining long-horizon consistency in conversational LLMs without resorting to large, complex memory architectures. It introduces ENGRAM, a compact memory system that partitions memories into episodic, semantic, and procedural types, connected by a single router and dense retrieval, storing records in a local SQLite store. ENGRAM demonstrates state-of-the-art semantic correctness on LoCoMo and surpasses a full-context baseline on LongMemEval while using roughly 1% of the tokens, highlighting substantial gains in efficiency without sacrificing accuracy. The findings suggest that careful memory typing coupled with straightforward retrieval can enable scalable, reproducible long-term memory for chat agents, and the authors provide a reproducible implementation and evaluation harness to encourage adoption and further research.
Abstract
Large language models (LLMs) deployed in user-facing applications require long-horizon consistency: the ability to remember prior interactions, respect user preferences, and ground reasoning in past events. However, contemporary memory systems often adopt complex architectures such as knowledge graphs, multi-stage retrieval pipelines, and OS-style schedulers, which introduce engineering complexity and reproducibility challenges. We present ENGRAM, a lightweight memory system that organizes conversation into three canonical memory types (episodic, semantic, and procedural) through a single router and retriever. Each user turn is converted into typed memory records with normalized schemas and embeddings and stored in a database. At query time, the system retrieves top-k dense neighbors for each type, merges results with simple set operations, and provides the most relevant evidence as context to the model. ENGRAM attains state-of-the-art results on LoCoMo, a multi-session conversational QA benchmark for long-horizon memory, and exceeds the full-context baseline by 15 points on LongMemEval while using only about 1% of the tokens. These results show that careful memory typing and straightforward dense retrieval can enable effective long-term memory management in language models without requiring complex architectures.
