Table of Contents
Fetching ...

O-Mem: Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents

Piaohong Wang, Motong Tian, Jiaxian Li, Yuan Liang, Yuqing Wang, Qianben Chen, Tiannan Wang, Zhicong Lu, Jiawei Ma, Yuchen Eleanor Jiang, Wangchunshu Zhou

TL;DR

O-Mem tackles the challenge of maintaining long-term, personalized interactions in LLM-powered agents by introducing an Omni Memory System built on active user profiling. It decomposes memory into three specialized components—Persona, Working, and Episodic Memory—with parallel retrieval and hierarchical organization (including LLM-assisted clustering) to sustain coherent personalization. The approach achieves state-of-the-art results on LoCoMo and PERSONAMEM benchmarks while delivering substantial efficiency gains, including reduced token usage and latency, and demonstrates robust memory-time scaling. These findings suggest that dynamic, multi-component memory with structured retrieval can significantly improve both personalization quality and operational efficiency in real-world AI assistants.

Abstract

Recent advancements in LLM-powered agents have demonstrated significant potential in generating human-like responses; however, they continue to face challenges in maintaining long-term interactions within complex environments, primarily due to limitations in contextual consistency and dynamic personalization. Existing memory systems often depend on semantic grouping prior to retrieval, which can overlook semantically irrelevant yet critical user information and introduce retrieval noise. In this report, we propose the initial design of O-Mem, a novel memory framework based on active user profiling that dynamically extracts and updates user characteristics and event records from their proactive interactions with agents. O-Mem supports hierarchical retrieval of persona attributes and topic-related context, enabling more adaptive and coherent personalized responses. O-Mem achieves 51.67% on the public LoCoMo benchmark, a nearly 3% improvement upon LangMem,the previous state-of-the-art, and it achieves 62.99% on PERSONAMEM, a 3.5% improvement upon A-Mem,the previous state-of-the-art. O-Mem also boosts token and interaction response time efficiency compared to previous memory frameworks. Our work opens up promising directions for developing efficient and human-like personalized AI assistants in the future.

O-Mem: Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents

TL;DR

O-Mem tackles the challenge of maintaining long-term, personalized interactions in LLM-powered agents by introducing an Omni Memory System built on active user profiling. It decomposes memory into three specialized components—Persona, Working, and Episodic Memory—with parallel retrieval and hierarchical organization (including LLM-assisted clustering) to sustain coherent personalization. The approach achieves state-of-the-art results on LoCoMo and PERSONAMEM benchmarks while delivering substantial efficiency gains, including reduced token usage and latency, and demonstrates robust memory-time scaling. These findings suggest that dynamic, multi-component memory with structured retrieval can significantly improve both personalization quality and operational efficiency in real-world AI assistants.

Abstract

Recent advancements in LLM-powered agents have demonstrated significant potential in generating human-like responses; however, they continue to face challenges in maintaining long-term interactions within complex environments, primarily due to limitations in contextual consistency and dynamic personalization. Existing memory systems often depend on semantic grouping prior to retrieval, which can overlook semantically irrelevant yet critical user information and introduce retrieval noise. In this report, we propose the initial design of O-Mem, a novel memory framework based on active user profiling that dynamically extracts and updates user characteristics and event records from their proactive interactions with agents. O-Mem supports hierarchical retrieval of persona attributes and topic-related context, enabling more adaptive and coherent personalized responses. O-Mem achieves 51.67% on the public LoCoMo benchmark, a nearly 3% improvement upon LangMem,the previous state-of-the-art, and it achieves 62.99% on PERSONAMEM, a 3.5% improvement upon A-Mem,the previous state-of-the-art. O-Mem also boosts token and interaction response time efficiency compared to previous memory frameworks. Our work opens up promising directions for developing efficient and human-like personalized AI assistants in the future.

Paper Structure

This paper contains 13 sections, 12 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Trade-off between performance and efficiency of different memory frameworks. (a) Left panel: Average latency per interaction (MemoryOS latency uses FAISS-CPU for compatibility, a conservative estimate). (b) Right panel: Average computational cost (tokens) per interaction. Results show O-Mem achieves Pareto optimality in efficiency and performance. Note: Token-control experiments were only conducted on LoCoMo's GPT-4.1; no token control for GPT-4o-mini and other two datasets.
  • Figure 2: Top: The process of encoding user interactions into memory in O-Mem. Different colors refers to different memory components. O-Mem encodes a user interaction into memory by extracting and recording relevant user attributes and event data into persona memory,episodic memory, and working memory. Bottom: The memory retrieval process concerning one user interaction in O-Mem. O-Mem retrieves from all its three memory components concerning one new user query.
  • Figure 3: Trade-off between performance and efficiency of different memory frameworks. The left panel (a) compares the average latency per interaction. The MemoryOS latency was evaluated using FAISS-CPU due to compatibility issues on our computing platforms, thus representing a conservative estimate of its latency. (b) The right panel compares the average computational cost (in tokens) per interaction. Results demonstrate that O-Mem achieves a Pareto-optimal solution in both efficiency and overall performance. Note that we only performed token-control experiments on Locomo's GPT-4.1 experiments. We did not control tokens for the experiments on GPT-4o-mini and the other two datasets.
  • Figure 4: Memory profile alignment dynamics during memory-time Scaling. More interactions lead to more concise user understanding from O-Mem.
  • Figure 5: Top: The process of encoding user interactions into memory in O-Mem. Different colors refers to different memory components. O-Mem encodes a user interaction into memory by extracting and recording relevant user attributes and event data into persona memory,episodic memory, and working memory. Bottom: The memory retrieval process concerning one user interaction in O-Mem. O-Mem retrieves from all its three memory components concerning one new user query.