Mnemis: Dual-Route Retrieval on Hierarchical Graphs for Long-Term LLM Memory
Zihao Tang, Xin Yu, Ziyu Xiao, Zengxuan Wen, Zelin Li, Jiaxi Zhou, Hualei Wang, Haohua Wang, Haizhen Huang, Weiwei Deng, Feng Sun, Qi Zhang
TL;DR
The paper addresses the challenge of long-horizon memory for LLMs by proposing Mnemis, a dual-route retrieval framework that fuses System-1 similarity search on a refined base graph with System-2 global selection over a hierarchical graph. The base graph stores granular memory components (Episodes, Entities, Edges, Episodic Edges) while the hierarchical graph enables top-down, layer-wise reasoning through Category Nodes and Category Edges under principled constraints. Through embedding+BM25 retrieval, RRF re-ranking, and a top-down global search, Mnemis achieves state-of-the-art performance on LoCoMo (93.9) and LongMemEval-S (91.6) with GPT-4.1-mini, outperforming RAG, Graph-RAG, and several memory baselines. The work demonstrates that combining complementary retrieval routes yields superior coverage and structural relevance for long-term memory, with practical implications for persistent AI agents and memory-intensive tasks; future work includes multimodal extensions and more flexible global-traversal planning.
Abstract
AI Memory, specifically how models organizes and retrieves historical messages, becomes increasingly valuable to Large Language Models (LLMs), yet existing methods (RAG and Graph-RAG) primarily retrieve memory through similarity-based mechanisms. While efficient, such System-1-style retrieval struggles with scenarios that require global reasoning or comprehensive coverage of all relevant information. In this work, We propose Mnemis, a novel memory framework that integrates System-1 similarity search with a complementary System-2 mechanism, termed Global Selection. Mnemis organizes memory into a base graph for similarity retrieval and a hierarchical graph that enables top-down, deliberate traversal over semantic hierarchies. By combining the complementary strength from both retrieval routes, Mnemis retrieves memory items that are both semantically and structurally relevant. Mnemis achieves state-of-the-art performance across all compared methods on long-term memory benchmarks, scoring 93.9 on LoCoMo and 91.6 on LongMemEval-S using GPT-4.1-mini.
