Table of Contents
Fetching ...

MemX: A Local-First Long-Term Memory System for AI Assistants

Lizheng Sun

Abstract

We present MemX, a local-first long-term memory system for AI assistants with stability-oriented retrieval design. MemX is implemented in Rust on top of libSQL and an OpenAI-compatible embedding API, providing persistent, searchable, and explainable memory for conversational agents. Its retrieval pipeline applies vector recall, keyword recall, Reciprocal Rank Fusion (RRF), four-factor re-ranking, and a low-confidence rejection rule that suppresses spurious recalls when no answer exists in the memory store. We evaluate MemX on two axes. First, two custom Chinese-language benchmark suites (43 queries, <=1,014 records) validate pipeline design: Hit@1=91.3% on a default scenario and 100% under high confusion, with conservative miss-query suppression. Second, the LongMemEval benchmark (500 queries, up to 220,349 records) quantifies system boundaries across four ability types and three storage granularities. At fact-level granularity the system reaches Hit@5=51.6% and MRR=0.380, doubling session-level performance, while temporal and multi-session reasoning remain challenging (<=43.6% Hit@5). FTS5 full-text indexing reduces keyword search latency by 1,100x at 100k-record scale, keeping end-to-end search under 90 ms. Unlike Mem0 and related work that targets end-to-end agent benchmarks, MemX focuses on a narrower, reproducible baseline: local-first deployment, structural simplicity, explainable retrieval, and stability-oriented design.

MemX: A Local-First Long-Term Memory System for AI Assistants

Abstract

We present MemX, a local-first long-term memory system for AI assistants with stability-oriented retrieval design. MemX is implemented in Rust on top of libSQL and an OpenAI-compatible embedding API, providing persistent, searchable, and explainable memory for conversational agents. Its retrieval pipeline applies vector recall, keyword recall, Reciprocal Rank Fusion (RRF), four-factor re-ranking, and a low-confidence rejection rule that suppresses spurious recalls when no answer exists in the memory store. We evaluate MemX on two axes. First, two custom Chinese-language benchmark suites (43 queries, <=1,014 records) validate pipeline design: Hit@1=91.3% on a default scenario and 100% under high confusion, with conservative miss-query suppression. Second, the LongMemEval benchmark (500 queries, up to 220,349 records) quantifies system boundaries across four ability types and three storage granularities. At fact-level granularity the system reaches Hit@5=51.6% and MRR=0.380, doubling session-level performance, while temporal and multi-session reasoning remain challenging (<=43.6% Hit@5). FTS5 full-text indexing reduces keyword search latency by 1,100x at 100k-record scale, keeping end-to-end search under 90 ms. Unlike Mem0 and related work that targets end-to-end agent benchmarks, MemX focuses on a narrower, reproducible baseline: local-first deployment, structural simplicity, explainable retrieval, and stability-oriented design.
Paper Structure (52 sections, 5 equations, 2 figures, 13 tables, 1 algorithm)

This paper contains 52 sections, 5 equations, 2 figures, 13 tables, 1 algorithm.

Figures (2)

  • Figure 1: MemX search pipeline. A query is embedded by Qwen3-0.6B (1024-dim) and routed to two parallel recall paths: DiskANN/brute-force vector search and FTS5 keyword matching. Results are fused via RRF ($k{=}60$), re-ranked by four weighted factors (semantic similarity, recency, importance, frequency), and filtered by a rejection gate that returns $\varnothing$ when similarity falls below threshold $\tau$. After deduplication the top-$k$ results are returned and retrieval statistics are written back to the database.
  • Figure 2: Vector vs. keyword search latency across custom and LongMemEval scenarios (log scale). At 100k records, LIKE-based keyword search dominates total latency (3,305 ms); FTS5 indexing reduces it to 2.9 ms ($1{,}100{\times}$ speedup).