Mnemosyne: An Unsupervised, Human-Inspired Long-Term Memory Architecture for Edge-Based LLMs
Aneesh Jonelagadda, Christina Hahn, Haoze Zheng, Salvatore Penachio
TL;DR
Mnemosyne addresses the challenge of long-term memory for edge-based LLMs by introducing a graph-structured, unsupervised memory with modular intake filters, probabilistic recall with temporal decay, and a core summary module to capture persistent user traits. The architecture combines commitment, recall, core-supersummary updates, and pruning to store salient memories within device memory constraints, enabling efficient retrieval through a hybrid similarity and time-aware recall mechanism. Empirically, Mnemosyne achieves state-of-the-art LoCoMo performance in temporal reasoning and strong human-evaluated realism, outperforming naive RAG baselines and maintaining robust edge-compatibility. These results demonstrate that reliable factual recall, temporal reasoning, and natural user-facing responses are feasible with an unsupervised, memory-graph approach on edge devices, with practical implications for longitudinal healthcare assistants and other personalized edge applications.
Abstract
Long-term memory is essential for natural, realistic dialogue. However, current large language model (LLM) memory systems rely on either brute-force context expansion or static retrieval pipelines that fail on edge-constrained devices. We introduce Mnemosyne, an unsupervised, human-inspired long-term memory architecture designed for edge-based LLMs. Our approach uses graph-structured storage, modular substance and redundancy filters, memory committing and pruning mechanisms, and probabilistic recall with temporal decay and refresh processes modeled after human memory. Mnemosyne also introduces a concentrated "core summary" efficiently derived from a fixed-length subset of the memory graph to capture the user's personality and other domain-specific long-term details such as, using healthcare application as an example, post-recovery ambitions and attitude towards care. Unlike existing retrieval-augmented methods, Mnemosyne is designed for use in longitudinal healthcare assistants, where repetitive and semantically similar but temporally distinct conversations are limited by naive retrieval. In experiments with longitudinal healthcare dialogues, Mnemosyne demonstrates the highest win rate of 65.8% in blind human evaluations of realism and long-term memory capability compared to a baseline RAG win rate of 31.1%. Mnemosyne also achieves current highest LoCoMo benchmark scores in temporal reasoning and single-hop retrieval compared to other same-backboned techniques. Further, the average overall score of 54.6% was second highest across all methods, beating commonly used Mem0 and OpenAI baselines among others. This demonstrates that improved factual recall, enhanced temporal reasoning, and much more natural user-facing responses can be feasible with an edge-compatible and easily transferable unsupervised memory architecture.
